The Problem With Backlink Checkers And Backlink Analyzers

by Michael Martinez on March 2, 2009

Seems like hardly a month goes by where someone doesn’t come out with a new backlink checker or backlink analyzer. I’m not really sure what the difference is supposed to be between a backlink checker and a backlink analyzer — maybe people are using different expressions to refer to the same thing. Maybe not.

“Backlink checker” makes sense — it should check to see if a Web site has some backlinks, or verify that links are still in place, or verify that the links should be passing value, etc..

“Backlink analyzer” really only makes sense, in my opinion, if the reporting tool offers some review of the individual backlinks and/or the backlink profile.

This is how I would classify such tools were I marketing such tools. No one else has to live by my classifications but I would want to evaluate different types of data with specialized tools (or specialized functions of a mega tool).

On the other hand, I don’t have much use for these kinds of tools. Sure, I look at backlinks for a lot of sites. I like to know if someone is basing their SEO strategy on links or content. I like to know if a random page on a large site has a good chance or a poor chance of being supplemental. I like to know what kind of people link to a specific site.

I don’t get any of that from backlink checkers and backlink analyzers. They’re all looking at Alexa rank, Google Toolbar rank, Yahoo! backlink counts, and other popular SEO stuff. These numbers don’t tell you anything useful. However, imagine if you had a database of Web sites which were semantically classified and user-valued by topic.

Then you could estimate audience sizes, topic-market reaches, and link sphere context. You could sum up some pretty good — what’s that? You don’t know what link sphere context means? Think about it. Everyone is concerned with relevant links but what determines relevance?

Today’s search technology can look at a few factors to gauge relevance:

  1. Relationship between anchor text and destination page title
  2. Relationship between anchor text and destination page copy
  3. Relationship between linking page title and destination page title
  4. Relationship between linking page and destination page copy
  5. Relationship between link page images and destination page images
  6. Relationship between linking page inbound link sources and destination page inbound link sources
  7. Ratio of topic weight for the anchor text to its source page copy versus the weight of the anchor text to its destination page copy

You can probably think of a few other potential scoring factors. By making all these comparisons, today’s search technology could assess some sort of mechanical relevance — it would essentially be an extrapolation based on keyword matching. Not that that is a bad methodology, but it is limited. For example, what if I link to a page about (house) cats from a page about (jazz) cats? Oooh! Nice little bit of link spam. Is it detectable? Not with basic keyword matching but semantic analysis would show there is a disconnect in relevance.

That kind of asyncronicity should not topple Web empires. Sometimes a site might legitimately say, “Are you looking for (house) cats or (jazz) cats? For (something) cats, look at this cats page.” Ickipedia calls this disambiguation. There is a core relevance between disambiguation content and each of the disambiguated keywords.

The search engines are still struggling to catch up to this level of analysis, so don’t expect a simple SEO tool to provide it you. The algorithmic requirements are extremely complex. Nonetheless, it would be nice to know how many sites about (house) cats are linking to my (house) cats site — that would define my “(house) cats” link sphere context. There might be other links pointing to my (house) cats site but those links would not fall into this particular context.

For example, suppose 15 baseball history sites also link to my (house) cats site. Those sites would constitute my “baseball (house) cats” link sphere context. The link sphere context is derived from the combined primary topicality of all similar sites in a backlink profile. In other words, you take a backlink profile, categorize it by (primary) topic, and then you weed out the sites that are using irrelevant link anchor text. What remains is a collection of 0 or more link sphere contexts.

LSCs are useful things to know about if you want to target specific audiences and demographics. LSCs also provide you with clues about query spaces that are relevant to your content. They won’t help you predict how high a page should rank in Google for a specific expression. In fact, if a link sphere context only resolved to one query I would think something was wrong. If an LSC consists of 25 pages there should be hundreds, possibly thousands of relevant expressions that lead to those documents through natural search.

I’ve looked at a lot of backlink checkers. They don’t even come close to categorizing the links, not even the ones that scrape pages in order to confirm the links are there.

I’ve looked at a lot of backlink analyzers. They can’t seem to see past the link anchor text.

And, frankly, all these checkers and analyzers are pains to deal with anyway. I never said you could scrape my sites. I never asked anyone to analyze my linking patterns to determine whether they could or should ask me for links.

In fact, I do block some scraper sites from my networks. I see value in particularly blocking SEO tools, but there are just so many of them. It would be nice if someone published (for free) an up-to-date list of robots to block — but then, who would trust a free list?

You can buy the lists from a few service providers. But the real solution here is for the SEO community to grow up and develop a mature attitude about link analysis. You’re not getting good information from these tools because they don’t even have access to good information. If the SEO community were to demand better tools, the icky little scraper problem would decrease. There would be less demand for these server-hammering non-solutions.

Sure, we can set up robot traps but there are some drawbacks to those kinds of solutions. No system is perfect.

It’s a sad state of affairs. The SEO community wastes a great deal of time and resources on link checking tools. Some people undoubtedly work hard to “opt out” of being indexed by those tools while using them for “competitive research”. And most of the competitive analysis based on those tools is very sketchy — not completely useless, as people will try to make some sense of all the numbers, but just not very high in value.

There is plenty of opportunity for people to create new tools but they need to rethink their approaches. In fact, it would be nice if SEO tools became more ASCII file/list-friendly because there are a lot of things that you could do if you just worked with lists of URLs.

Think about it.

{ 3 comments… read them below or add one }

ericward 03.03.09 at 3:06 pm

This was art. Wonderful. I’ve used every back link checker I can find, stopped. Wrote my own, hated it, re-wrote it, tolerate it. Now, think of the role LSCs play with massive sites about every subject, like wikipedia or the Library of Congress, compared to small sites about one single subject.

Michael Martinez 03.04.09 at 12:07 am

Thanks. And, yes, I agree that link sphere contexts have driven a lot of large sites to the top of search results for tons of topics. In fact, that will be part of what I have to say in an upcoming article.

seercomp 09.11.09 at 5:04 am

Very interesting article, however long i research this area it seems to get more complex with no one offering a definative answer, meanwhile Companies are running round in circles trying to better their rankings but not knowing whether they are actually degrading their sites.