NOTE: Several edits have been made several days after the original article was posted.
One of the most bizarre practices I have seen develop in the SEO world over the past couple of years is the increasing reliance upon other search tools (particularly Yahoo! and Alexa) for analyzing positions and backlinks in Google. While I can hardly stop the SEO community from codifying pure stupidity into a religious daily prayer, it does seem like a lot of people are not really satisfied with this kind of blind-leading-the-blind analysis.
After all, I do occasionally see people start their forum pleas for help with something like, “I have 2,000 backlinks in Yahoo!” or “My Alexa something-or-other is XXXX” and then they go on with the usual refrain of “but I have lost my Google rankings“.
Let me get the obvious, obligatory, common sense objections out of the way so I can focus on what Google actually tells you about a Web site.
First, Yahoo! may very well report 2,000 backlinks for your site, but most likely you’re only looking at linkdomain — which under-reports your backlinks by only telling you how many domains (including all their sub-domains) link to your URL. Some people know enough to dig a little deeper with Yahoo!, but you still see only what Yahoo! has crawled and indexed. That tells you nothing about what Google has crawled and indexed (much less what Google will allow to pass value).
And Alexa — well, they operate two databases. One database handles their normal search queries and the other database is the one with all the rankings based on the data they collect from the browsing habits of their toolbar users.
Although Alexa data has always offered some unambiguously objective value for Web properties with huge audiences (like Google, AskMen, CNN, Yahoo!, and Microsoft.com) domains with smaller target audiences have found more than one way to skew their results. Alexa rankings are therefore highly unreliable because they don’t represent a scientifically valid sampling of user behavior. Some companies actually require all their employees to surf with the Alexa toolbar installed in their browsers and they have to set the company Web site as their browser home page. If you use 500 employees scattered across 10-20 locations, you can easily see how that kind of policy may unduly influence your Alexa rankings.
Any other Alexa-like service is similarly limited by its data collection ability, so take any rankings you find on the Web with a grain of salt. One potentially useful ranking (provided there are enough reliably cullable sources) would be to average out rankings from 5-10 sources and use their aggregate values. But I doubt you could find 5 services whose rankings systems would be compatible with each other. You could do some weighted adjustments, but in the end you’re still not looking at data that tells you anything about what Google knows.
Google, on the other hand, is no longer as Webmaster-friendly as it once was. For example, they now only show random samplings of pages that link to a given URL when you use the link: query operator. And, contrary to popular SEO myth, they do include URLs with Toolbar PageRank values less than 4 (I have confirmed this with backlinks to my own domains). The lowest TBPR pages I can find right now in today’s current sampling have values of 2. I would not be surprised to occasionally see 1s and 0s.
On the other hand, even though I among many other people often say that Google’s link: query operator used to report more accurate data, that was never the case. It used to operate differently and report more data, but it has never reported all the data that Google has about links. SEOs have been chasing Google backlinks in vain all these years, and the clueless community just absolutely refuses to accept that it has never known about all the backlinks and never will.
Not that links are as important as SEOs believe they are. This pernicious myth will probably never die. It is very possible that Matt Cutts is himself indirectly responsible for this nonsense. He mentioned “thematic links” at the 2004 Search Engine Strategies conference in London. SE Roundtable included the following unattributed quote: “It highlighted why forum links, link farm links, guestbook links, off theme links are not weighted as highly as thematic links from authority sites.”
Despite the fact that version was widely reported, another variation appeared on the Web about the same time (early June 2004): “Matt Cutts, a Google developer, said that ‘thematic incoming links from authority sites carry more weight than on-page optimization.’” This quote has been picked up by more than one site but I’m not about to give them any link love.
I will say that I was not at SES London and I did not hear what Matt said. But the second quote is so contrary to everything I have ever seen him post about links and content on the Web — not to mention numerous technical papers going all the way back to Sergey Brin and Larry Page’s seminal Anatomy of a Large-Scale Hypertextual Web Search Engine — that I just have a hard time believing he would say something so uniquely at odds with everything else Google has said about its algorithm through the years.
The whole point of PageRank — originally — was to use it as a static weighting factor to be added to the dynamically generated relevance scores. The relevance scores were derived from both on-page and off-page factors, but in terms of number of factors Google has always only openly discussed more on-page factors than off-page factors. Some people have also misconstrued portions of the Brin/Page paper so badly that they will tell you it is necessary to have inbound link anchor text to be selected for competitive queries, even though the paper clearly states that the index where anchor text is stored is also populated by on-page data.
Now, Google does intentionally and deliberately disable information it feels SEOs pay too much attention to. They only publish Toolbar PageRank and backlink data long after it was useful information. By the time you see it, it’s not telling you anything useful. If your PR 2 page just went up to a 4 in your Toolbar today, Google may have already rerated it to a 6 (and who knows what the actual Internal PageRank would be for either Toolbar value?).
Now, enough with link: and Toolbar PageRank. Let me turn to some other Google tools. When he was first attempting to allay concerns about a new algorithmic update over the 2006 Christmas Weekend, Matt Cutts told me on his blog: “Michael, inanchor: and allinanchor: is used by practically nobody except for SEOs. And I’m including Googlers in the don’t-use-much column”. Well, at least he told us who Google thinks uses those query operators, but so far as I and other people can now determine, they appear to be reporting more accurate data than they were up until around the beginning of December.
A little further on, he added this clarification:
Graywolf, I was trying to say gently that hardly anyone at Google pays any attention to inanchor: or allinanchor: other than to ask if the results are technically correct. Discussing the results for searches with that operator are like saying “I rank really well for the are-these-words-in-my-title operator†and all the while ignoring the 99+ other factors that make up ranking. Rankings for inanchor:/allinanchor: simply can’t be generalized to search results, and people shouldn’t expect them to.
Call that a minor tweak to the service, rather than to the ranking/scoring algorithm.
Matt also mentioned earlier this year that Google was working with the site: query operator and frankly they were not having much luck with it. At one point, prior to the last round of implemented changes, Matt told people that if they saw two numbers reported by site: they should trust the smaller number more. I do believe that site: is now more accurate than it has been in quite a while, but given that Google is now doing daily data refreshes, I’m not sure of how reliable we should expect it to be.
And, finally, in September Matt pointed out that Google had changed behavior in its URL queries:
Previously we treated the query [www.example.com] like the query [info:www.example.com], and now we treat it like [â€www.example.comâ€]. The query [info:www.example.com] returns the single url www.example.com if we have it in our index, along with other choices like “see backlinks for www.example.com†(I’m oversimplifying a little, but nothing too bad). The query [â€www.example.comâ€] searches for that as a phrase, and thus returns the ten best matching urls, which will usually show www.example.com at #1 or high in the search results.
So a lot of things have changed, not just in the past year but through the years leading up to 2006.
It should be no wonder most SEOs are confused about what they can learn about their sites from Google. But you can nonetheless learn a great deal more from Google about what Google knows than you can from Yahoo!, Alexa, AllTheWeb, and any other non-Google source of information you can think of. You don’t have to take my word for it. The query operators are there for you to play with. I won’t explain how I use them, but I’ll share my opinion on which ones are relatively useful and accurate.
inanchor: - Relatively useful in more than one way.
info: - Useful for anyone who is afraid their site is not in the index. However, be sure you include the http:// even if 100 Googlers tell you it doesn’t matter.
intitle: - Relatively useful in more than one way.
site: - Relatively useful in a number of ways. Based on my own knowledge of my domains, I am seeing reasonable numbers. Google doesn’t necessarily index all my pages, but even on my large well-linked domains it reports more satisfying numbers than I have seen in years.
URL queries - Relatively useful in more than one way.
You won’t get an accurate count of (Google-indexed) backlinks from Google or any other source. However, the good news is that you don’t need to know how many backlinks Google has found because your rankings will tell you if your rank-by-linking strategy is working. If you’re not at number 1, and you honestly believe the only way you’ll get there is through links, then get more links.
Good luck finding them.
Generally speaking, Google will tell you:
- How many pages in the combined indexes (Main and Supplemental) refer to a given URL (in links and unlinked text)
- How many pages from a domain, sub-domain, or sub-directory are indexed
- What it found on pages it shows in its cache
- When Google last
visitedcached those pages (I would prefer to know when they were actuallycachedvisited) - Other neat things most of you have never stopped to consider
I can usually do a more thorough analysis of a Google index listing in 5 minutes than most SEOs can in an hour, solely because I rely on Google to tell me what Google knows. I don’t look at whoopdeedo SEO tools — especially not SEO tools that rely on Yahoo! and Alexa. You would do just as well to bring a bicycle to a gymnastics meet. Using the wrong apparatus doesn’t make you competitive no matter how many other people broght their bicycles to the gymnastics meet.
Google doesn’t tell me everything. I’m not best friends forever with any Googlers. I don’t have to be. Nor do you need to be. Just take some time to experiment with what Google will tell you. You’ll gain a competitive advantage over many other people in this industry (after all, most people are chasing Google rankings through Yahoo! data).
And remember the first rule of SEO: Don’t share everything you know. Keep some secrets to yourself, no matter how badly you want to show off your ideas.
{ 0 comments… add one now }
You must log in to post a comment.