One of the latest ongoing concerns in the SEO industry appears to dwell on the number of pages that a search engine indexes (most specifically Google, but this discussion really extends to other search engines).
Of the four major search engines, Ask and Bing are the stingiest when it comes to indexing deep content from Websites. Yahoo! is pretty generous with its indexing but pretty much everyone in our industry now fusses only over Google’s deep indexing.
Index depth is a low-calibre metric for the quality and search potential of a Website. That is, if you have a Website with 100,000 pages, knowing that only 2,000 of those pages is indexed doesn’t tell you much.
Some people might be quick to point out that those 98,000 unindexed pages could draw substantial traffic. What if each page — were it indexed — drew an additional 1 visitor per month to your Website? That’s 98,000 search conversions you’re not getting now, right?
Maybe. But those 98,000 search conversions are built on the worst of foundations: IF.
IF-based SEO is about as reliable as betting on a horse at a racetrack based on its handicap. “The odds are 100 to 1. IF this horse comes in, I’ll win a bajillion dollars!”
If (and I use the word carefully in this context) that is the way you want to do your SEO, good luck to you. You’ll need all that and more.
What if your 98,000 unindexed pages are autogenerated place-holder pages that consist of nothing more than boiler-plate templated text with a few injected keywords? If that is the basis of your SEO strategy, you have nothing to offer to those imaginary 98,000 search visitors. Even if they show up, the chances of their converting for you are pretty slim.
That’s the mentality that SpamAd site operators take. They work on low quality volume. So simply knowing how many pages of a site are indexed doesn’t tell you anything useful to search engine optimization. Does that sound familiar? It should, because I have pointed out through the years that simply knowing how many links point to a site doesn’t tell you anything useful, either.
The SEO industry wants to quantify things, and I’m not sure of why. There is no real value in quantification. The quantification — in order to be useful — must be tied to a specific value scale. Who is paying money for getting X number of pages indexed? If there is someone out there with that kind of agenda, then you can certainly optimize your search to get more pages indexed. Job done.
The SEO’s job is determined by the needs of the end-user. Therefore the tools the SEO uses must be flexible and customizable. Link counts and Indexed Page counts are neither flexible nor customizable. They are random, inaccurate, search engine-specific numbers that provide you with neither insight nor advantage in search optimization.
So Barry Schwartz ran a poll asking SEOs whether they use the site: query operator or Google Webmaster Tools to determine how many pages are indexed (in Google, obviously). In recapping the poll results, Barry wrote: “…31% said they still use the Google Site Command. I am a bit upset to see so many people using the site command but I guess it is hard to teach an old dog new tricks?”
It may indeed be hard to teach an old dog new tricks but his comment leaves me wondering WHY the old dog should need to learn such a useless new trick.
I can easily find pages in the Google index that are not reported by Webmaster Tools. Why is that? I have no idea. I don’t care. It means the WT report is not a reliable source of information, so why has the SEO community suddenly fallen in love with a resource that — up until recently — was the SEO class’ kickaround kid? What’s up with that, homeys?
Here’s the thing: Over the past few months many people have complained in various Web forums and at Google’s support groups that their sites have lost page visibility in Google’s index. That is, they are counting “number of pages indexed” and flying into a panic when that number drops from 2000+ to 1200.
I’ve seen this happen with my own sites. For years and years Xenite.Org’s index count has shot up and dropped down in precipitous swings when you do a site: query. I haven’t noticed any correlating drops in search referral traffic. So if we assume for the sake of discussion that the changes in reported page counts have something to do with Google’s indexing, those extra pages aren’t helping much, are they?
On the other hand, on any day I can look at a page index count and drill down deeper to find that the numbers change radically. Starting at the root URL and clicking through to the end of the search results for most sites, the reported number of indexed pages drops radically. But if you then select a sub-domain or sub-directory from the site and drill down to the last search result page for that query, you suddenly find all sorts of pages that didn’t appear in the original query.
Why is that? I have no idea. But it tells me that the site: query operator is a special needs tool. It needs special understanding for proper use, not to mention some patience and common sense.
When I look at the data provided by Webmaster Tools I just want to gag. The dates don’t match cache dates provided by the search index, pages that seem to be missing from the WT reports show up just fine when I use the info: or site: queries, and many backlinks that go missing in the WT reports show up just fine in Google’s index.
I don’t know what the purpose of the Webmaster Tools data is supposed to be, but it’s not helping much with analyzing a site’s Google Index Health. Frankly, I’d rather use the site: query operator. In fact, I DO use the site: query operator — when I want to know if a specific set of pages has been crawled and what their apparent state of indexing is.
You can use either the search box or the Webmaster Tools interface to make some sort of pungent guess at what Google is doing but that is about it. From a search optimization perspective, unless you have been specifically charged with improving an index report count, you’re spinning your wheels looking at page counts anyway.
Site search is better utilized to determine which pages a search engine will return from a site for a given query string (aka keyword). Think about it: if the search engine won’t show that page for the keyword in a site search, doesn’t that tell you where to look when asking why the page doesn’t rank in a normal query?
Site search can help users find specific content in large content sites. It can help SEOs find out which pages have been fully indexed, which pages are being treated as if they are duplicate content, and which pages are NOT appearing in search results.
You can customize your site search by adding and changing terms in the query.
You’re pretty much stuck with the inaccurate, sometimes quite misleading data that Webmaster Tools provides you.
How many times have you wondered why you cannot find your site ranking for queries near the positions that Google reports in Webmaster Tools? I’ve given up trying to make sense of those idiotic reports. And apparently many people who complain about them in forums and support groups are noticing the same inconsistencies as me, so this is not just me grousing on the basis of personal experience.
It’s unfortunate that so many people in the SEO community try to quantify things without any purpose. Just counting the number of things a search engine will report to you tells you nothing. These are data points outside the graph. You need to pick the graph and understand what it is designed to do before you start plugging numbers into it.
Things I’d like to know that you cannot learn from page and link counts include:
- How many pages are fully indexed
- How many pages are being shown to searchers in clickable zones
- How many pages are being handicapped by poor on-site optimization
- Which pages are most valued by the search engine
- Which pages have the most value to pass to other pages
- Which pages are allowed to receive value from off-the-wall pages
- Which pages are being recrawled often
- Which pages are not being recrawled often
There is currently no tool or method for determining these things with any reliable accuracy. Don’t even start to tell me about your favorite tool. It doesn’t do the job.
But, more importantly (and to the point), you cannot begin to answer these types of questions by counting links and indexed pages. This is the kind of knowledge that empowers a search optimization specialist.
If the SEO community would adopt some real standards, nonsense metrics like backlink counts, indexed page counts, and Google Toolbar values could be openly questioned and challenged in a formal environment. People would have a better opportunity to learn just how useless this kind of fluff data really is, and hopefully learn that they don’t need to waste their time pursuing numbers that (in themselves) have no meaning or relevance to search engine optimization.
{ 8 comments }