Site Search – Google 0, Yahoo! and Microsoft 2

by Michael Martinez on October 4, 2007

I have occasionally pointed out that one of the best ways to study search engine optimization is to look at site searches. You won’t get a better picture of how a search engine sees your site than by asking the search engine to show you content that you know is there.

You really cannot count on the search engines to give you accurate counts of how many of your pages they index but you can become familiar enough with their reported numbers that you can gauge changes in indexing trends. I would normally expect a search engine to under-report existing content, assuming the search engine dumps pages from its index before attempting to recrawl them (or after failing to recrawl them).

Unhappily, knowing how many pages you have on your Web site doesn’t help you evaluate a site search. I have about 2,000 indexable pages on Xenite.Org. Google, Live, and Yahoo! report about 1700, 3400, and 21,000 pages respectively.

The immediately obvious overcounts from Yahoo! and Microsoft are inexplicable. Yahoo! has always made up stuff about Web sites and backlinks so you can’t really trust any numbers it reports. One reason why Yahoo! includes bogus data is that it randomly generates improbable URLs for your site and tries to crawl them. They do this, I have been told, to check how your site handles Code 404 conditions.

Not that it is any of Yahoo!’s business how I handle a non-existing URL, I found I had to resort to creating a mini-sitemap on my 404 document so that Yahoo! would stop adding bogus URLs to my page count. So far as I can determine, they won’t drop the fake URLs from their index. So there are a LOT of copies of my old root URL page in Yahoo!, I suppose, but that is their problem, not mine.

I cannot explain or excuse Live’s enthusiastically high estimate of pages for Xenite. As they just redesigned and expanded their search engine’s index, I’ll give them a little leeway for now.

However, to their credits, both Yahoo! and Live will find deep content on Xenite that Google pretends either doesn’t exist or is just not important enough to be listed first in site searches. When it comes to being a useful site search tool, Google just plain sucks. It’s one of the worst site search resources on the Web.

Why is Google so bad at site search? Supplemental Pages.

Now, some people may be quick to pull out their Sculpting PageRank handbooks — but before you start lecturing me on how Google uses PageRank to figure out which pages are most important, etc., etc., bear with me as I talk about on-site navigation and visitors.

People increasingly use the major search engines as site navigation tools. Even people on my own team (including me) will go to a search engine and type a site name into the search box just so we don’t have to fuss with specifying URL punctuation. There have been many times where I’ve been on one page of a site and used a search box to get to another section of the site just to avoid looking for navigational links that are hard to find.

Making it easy for people to get from page A to page B regardless of what page B turns out to be is vital. I often curse news Web sites because their on-site search functions really, truly suck in the worst, awfulest, most profoundly bad ways. Forum search tools come close to being as bad as news site searches.

But Google won’t show your Supplemental Results pages first even when they are clearly the most relevant pages for a highly specific query. Worse, if you need to search for words on a Supplemental Page that are close to each other but which are not next to each other, the chances of that page coming up in a site search are virtually nil.

Both Yahoo! and Live bring up the pages Google won’t show you, even when those pages are the ONLY pages that should be relevant to your query. If you specify a phrase without quotes that occurs only on one page, Google will show you non-Supplemental results first before listing the only page that exactly matches your query. Both Yahoo! and Live will show you the same page (and in my tests this week both Yahoo! and Live have consistently shown the right page).

Disclaimer: Your results may vary, especially as neither you nor I know for sure which pages are in Google’s Supplemental Index.

Google has become so paralyzed by the fear of showing (gasp!) spammy pages in its search results that it has disowned about 80% of the Web. What good is a search engine that won’t show you legitimate, unique content just because they can’t find enough PageRank to stretch to that content?

You have to ask yourself, if Google is that bad with site search — where clearly neither spam nor competitive factors need to be dealt with — how bad are their organic search results become? It’s not what you see that matters in Google but what you don’t see. Why? Because you have no way of knowing if Google’s unfettered relevance algorithms would rank something more highly.

But for the sake of tripping a mindless filter or lacking just that much internal PageRank, perfectly good content simply falls into the Supplemental Index and therefore goes to the end of the search results. Who needs a search engine that doesn’t want to do its job?

Implementing a site search is a lot of work. Even if you have a CMS you need to really thoroughly test the index to be sure that all of your content comes up. And you cannot afford to play games with “rel=’nofollow’”. When people want to know how to contact you or what your site is all about, your site search tool had better show them the right pages.

For someone like me, who doesn’t use CMS-based design, changing site search providers is a royal hassle. I’ve done it several times in the past. I used to use whatUseek (a service no longer available). I had to move my forums to another domain because whatUseek could only handle 5,000 URLs in its free service and it wasn’t worthwhile for me to pay for professional site indexing.

Maybe Google’s Search Appliance would do a better job of indexing a site but I’m not going to spend money on site search. I shouldn’t have to. In fact, since I open my server to search engine crawlers and allow them to pluck down the same pages over and over again, they could at least return the favor and let me see those pages in a site search.

It’s not like I’m asking Google to let Xenite rank for viagra, real estate, and pornography. If I wanted to do that, I’d put relevant content on the site.

Although I go to great lengths to provide my visitors with strong internal navigation — even to the point of letting it become a distraction on some pages — people historically use search tools to find their way around large content sites. I’d rather they used a reliable tool than rely on a tool that willfully and deliberately omits perfectly good content just because “your site only gets so much PageRank”.

That dog won’t hunt.

As a search user I want to find the content I am looking for, not the content that someone else decides is right for me because I “don’t know any better than (to assume it’s the best content available)”. That is almost precisely what Matt Cutts told people at SMX Advanced in Seattle when they asked him why Google won’t relent on its favoritism toward Wikipedia (and until that day I had steadfastly refused to accept that Google would show algorithmic favoritism to a frequently discredited Web site that is a source of a great deal of misinformation).

So, even though it means a lot of work for me in coming weeks, I think it’s time to throw in the towel and say I’m done with Google’s site search. It’s useless to the point of being a complete waste of time. If they don’t care enough about the quality of their search results to provide my visitors with access to all the content on my site, I’m not going to help them build their search traffic any longer.

I think I’ll give Microsoft’s new search engine a try. They are already sending more referrals to Xenite than Yahoo!, and that was something I had never expected to see. I don’t care if Microsoft says I have 10 times more pages than I really do as long as they show my visitors what they are looking for.

That is what site search is all about.

And that is what search engine optimization has to do: figure out how to get people to the content through search. The search tool doesn’t matter as long as it works.

{ 5 comments… read them below or add one }

wibbler 10.04.07 at 3:45 pm

Micheal – you sure seem to be miffed at google.

Im reading all here – and I think your comments sometimes reflect the same thoughts as a large group of siteowners on the net.

Basically,
1 – “how bad are their organic search results become?”
2 – “But Google won’t show your Supplemental Results pages first even when they are clearly the most relevant pages for a highly specific query.”
3 – “I’m done with Google’s site search. It’s useless to the point of being a complete waste of time.”
4 – “What good is a search engine that won’t show you legitimate, unique content just because they can’t find enough PageRank to stretch to that content?”

My feeling is that Yahoo and MSN make mistakes and – well errr – just kind of fumble along (relatively though – keeping in mind the complexities) – whereas Google…………
Need I say more?

Michael Martinez 10.05.07 at 7:51 am

No search engine is perfect but I need a site search function that doesn’t play politics with the Web. Right now, Google’s algorithmic judgement of what is worth showing to people sucks and until they fix that problem I’m not going to have many positive, supportive things to say about their search.

wibbler 10.05.07 at 3:19 pm

“Google’s algorithmic judgement of what is worth showing to people sucks”

Thats what its all about – you know it and I know it. Infact half the internet knows it.

Why the hell does noone do anything about it?

:|
Grim.

bizdevmarketing 10.05.07 at 10:31 pm

Michael Martinez 10.07.07 at 3:20 am

They can pretend they no longer operate two indexes, but that just makes them dishonest. I warned them on their own blog that if they don’t treat unique Web pages in the Supplemental Index better that the complaining will continue and will grow louder.

I’ll be using Xenite.Org as a bully pulpit to inform my fellow science fiction and fantasy fans that they’ll be better served in their site search functions by Microsoft and Yahoo!.

Disclaimer: The opinions expressed in this comment are mine and mine alone and do not necessarily reflect those of my employer.