Link Flow Analysis – How to do Link Flow Analysis

by Michael Martinez on October 13, 2008

“Link flow analysis” and “link juice” are two of the SEO industry’s currently hot buzz expressions. Link juice is a nonsense term, a euphemism people use for that undefined or undisclosed valuation of document relationships that search engines utilize for their own internal purposes. Link flow analysis, on the other hand, has several legitimate meanings (and probably a few nonsense associations as well).

To understand Link Flow Analysis better, let’s first rule out the things it is not and the things it won’t do.

What Link Flow Analysis Is NOT

  1. Link flow analysis is NOT a method for analyzing how any search engine evaluates Web sites
  2. Link flow analysis is NOT useful for competitive intelligence
  3. Link flow analysis is NOT a helpful aid in finding new linking resources

What Link Flow Analysis Will NOT Do

  1. Link Flow Analysis will not make you more competitive in the search results
  2. Link Flow Analysis will not uncover any secret pools of link value
  3. Link Flow Analysis will not explain why site A outranks site B

I can almost guarantee you there are people out there on the Web right now promising all these benefits and more from link flow analysis. They are selling snake oil.

Let’s begin looking at what Link Flow Analysis is and does by first defining link flow (as best we can define ‘link flow’).

  1. Google appears to define link flow as the PageRank that is passed through a link. (NOTE: I’ll correct this if any Googlers want to offer a clarification.)
  2. I define link flow as the “pathways between pages”1,2.
  3. Some SEOs map Toolbar PageRank.

Although we could find other definitions for link flow, these three examples should make it clear that there is no general agreement on what link flow is. But I think it should also be self-evident that we’re really not talking about the movement of links around or between documents; rather, we’re talking about the movement of value through links from document to document.

Googlers seem to really be talking about internal PageRank, although I suppose it’s possible they might be trying to use the Toolbar PR as a common ground with the SEO community. They certainly use it as a stick to beat up on people with.

Halfdeck’s tool works with the only available measure of PageRank — the Toolbar, so we can hardly fault him for not working with the internal stuff.

But there is more value to links than PageRank (which, after all, is only a Google thing). Other search engines definitely evaluate linking relationships, and it’s most likely safe to assume they all measure some sort of link equity (another currently hot SEO industry buzz expression that I actually like, although it is also poorly defined). I often say “PageRank-like” when talking about the values that other search engines may compute for Web pages.

A link passes 6 types of value:

  1. Traffic
  2. Visibility
  3. Crawling
  4. Trust
  5. Anchor Text
  6. PageRank (or something like it)

The SEO community can only quantify one of those types of value, and that would be the first: traffic. Of course, we can use Toolbar PR to speak about a derivative value, but we don’t know how current any particular Toolbar PR value is (even if it has only just been published). Matt Cutts has said more than once that Toolbar PR data is published after the value it represents has already been factored into their database.

So under the broader definition that I have structured, link flow refers to any value that is passed through links — and you cannot measure that flow. There are no Trust Points that you can assign to any given link to know how much trust it passes.

Now, if you know about a link and its anchor text, you can determine (to a limited extent) whether the link may pass anchor text. Just search for the anchor text. If it’s unique to the linking page and the destination, at most you should see only those two results. In practice we tend to point more than one link at a page in order to beef up its relevance.

There are many tools available for link research. Of course, the SEO community does a poor job in general of acknowledging the limitations of these tools. Search for discussions and blog posts about link analysis and you’ll usually find people recommending the use of Yahoo! Site Explorer and Google Webmaster Tools. For what it’s worth, there is also Live’s Webmaster Center.

All of these resources provide you with information about your Web site. None of these resources provides you with perfect information. Let’s ignore the brand names for a moment and look at the limitations that all the resources share:

  1. Each resource can only report on data its respective search engine provides
  2. Each resource operates on a delayed-reporting basis (this is NOT real-time data)
  3. Each resource fails to disclose SOMETHING about the link data it reports

You cannot use Link Report A to analyze how competitive a site may be in Search Engine B. Link Reports are only relevant to the search engines that produce them. Hence, your link analysis has to be search engine-specific. Now, we know that the search engines all impose some limits on the value that links in their databases may pass. For example, Google doesn’t allow all links to pass PageRank. Yahoo! says the first link from a domain counts more than the others. And so on.

Hence, your search-engine-specific link analysis is not going to be very accurate. Now, while it would be great if we could get accurate, timely data from the search engines, it’s clearly not in their best interests (nor the best interests of their searchers) to put the candy out where the kids can get into it completely unsupervised.

But what if we could use something else for our link analysis? Suppose Company X offers its own linking data, based on their own crawling. There are several SEO tools that have been around for quite some time that purport to do this. But their tools are only relevant to their own databases. That is, these tools will (in typical SEO fashion) query Google for Toolbar PR data and query Yahoo! for backlink data (both perfectly useless measures of value) and they’ll let you sort their data by various options. But they still don’t know if they know everything that either Google or Yahoo! knows.

In fact, we can say with complete certainty that no search engine has complete knowledge of any other search engine’s data and algorithms. It doesn’t matter if you’re comparing Yahoo! to Google or some SEO tool to Google, you’re comparing apples to oranges. You cannot obtain any insight into what Google knows or thinks its knows through another search engine or any SEO tool.

But let’s assume that the absolute very best SEO tool out there (call it SEO Brand X) really does a great job of approximately mirroring the database for any search engine. You’ve got an acceptable reconstruction of Search Engine A’s database. Great! You’re ready to roll.

Except for the Synchronicity Issue.

Synchronicity occurs when two more-or-less equivelant or related events happen (at the same time) for unrelated reasons. If two search engines were to crawl and index the Web in generally the same way at the same time, those would be synchronous events. In practice, this just doesn’t happen. So all our assumption does is place us in SEO Fairy-tale Land.

In other words, there is no SEO Brand X resource that creates a Web map that looks like the Yahoo! Web Map, or the Google Web Map, or the Live Web Map in any useful way. SEO Brand X is just another search index building its own Web Map. Great for Brand X. Bad for you and me.

So we have no way of measuring how much PageRank flows from document to document. We can, at best, find pages that use noindex, nofollow, rel=’nofollow’, and Javascript or Flash links but we cannot map where the PageRank-like value flows.

In other words, SEO Brand X still doesn’t know which pages pass value in any search engine’s database. SEO Brand X can do a perfectly fine job of showing you which domains have the most inbound links (according to its own crawling) and it can grab backlink reports from other search engines (like Google and Yahoo!) but it cannot tell you where the PageRank-like value is or where it is passed.

Visibility is a different issue. I said there is no way to quantify Visibility but we can partially quantify it for search (not for links). Measuring search visibility is easier to do with pages that have relatively few inbound links and which contain little to no indexable text. That is, there is a Search Visibility Curve for every document that is shaped by the number of queries it appears in on every search engine.

Every query can potentially show up to 1,000 results (on the major search engines — results limits may differ on smaller or newer services). You can quantify single-query Visibility in any of several ways. For example, you could assign a weighting of .001 to the 1000th listing and a weighting of 1 to the 1st listing. If you could determine that a page is visible for 300 queries, you could sum up all its Visibility weights.

This is a crude measurement, because a document that is poorly visible for 1,000 queries could potentially outweigh a document that has perfect Visibility for 10 queries. But that is Search Visibility.

As with Search Visibility, you could devise any number of ways to measure Link Visibility. Let’s say we assign a value of 1 for every 1,000 page views that occur within a calendar month for any link on a Web page. For example, suppose I embed a Javascript link (that points to Google) on a specific SEO Theory page. Suppose SEO Theory generates 100,000 page views for that particular page in the month of December. In that case, the link earned 100 Visibility Points.

Now suppose that Javascript embed was placed 1,000 sites altogether and collectively those sites generated 10,000,000 page views in the month of December. The Link Visibility score would (under this proposed model) be 10,000.

To be counted, a page view would have to ensure that the visitor could actually see the link — so the link has to be clearly visible, not buried in a footer or behind an image. Any hidden link would earn a Visibility score of 0 by most measurements.

These types of measurements are too crude to be useful but many people seek branding value from the mere display of links in Javascript ads and other ads that are not expected to pass PageRank-like value.

Another way to measure link Visibility would be to count all the pages on which a link is present, where those linking pages rank well for trafficked queries. But you’re not measuring how value flows through links. At least with the page view model we could assess some sort of brand value passing to the destination sites.

Crawl and trust are managed by the search engines internally. You never see a search engine fetch that includes the source page for the link the search engine followed. Nor do search engines disclose which pages confer trust. It is reasonable to ask whether a document that passes anchor text also passes trust. Any search engine could be constructed to pass anchor text, trust, and PageRank-like value completely independently of each other. In such a scenario, the most powerful links would pass all three values.

So now that we’ve looked at the limitations of what we can do with search engines, what else can we do to measure and analyze link flow? First, we have to define a quantifiable and measurable value to track. That doesn’t include PageRank-like value or trust. It might include anchor text, but in most cases you’ll have too many links pointing the same anchor text at a destination to determine which links work in any search engine.

You can measure how many link sources are indexed by search engine, however.

You can measure how many unique anchor text expressions are passed from document to document by search search engine.

You can measure which page caches are updated on a daily, weekly, monthly, or longer basis by search engine.

These measurements don’t show you which documents are passing value through their links, but you can estimate which documents may be available to pass value (indexed), seem to be passing some value (unique anchor text), and when they may pass value (cache frequency).

You can also measure rates of page indexing with new sites. How long does it take pages to be crawled and indexed. Which pages show up first? When you run site searches on the new site, which pages are ranked first for specific queries? If you know which pages the search engines deem to be the most important on your site, you can test those pages to determine if they pass some type of search-specific value through their links (rather than just drop links randomly across a site).

You can measure fetch rates for your pages and compare those fetch frequencies to the number of reported backlinks (DO NOT USE ANOTHER SEARCH ENGINE FOR THIS). If page A is fetched twice as often as page B, and page A has fewer reported links than page B, what do you think that says about page B’s backlinks?

You cannot answer that question knowledgeably if you use search engine A to analyze backlinks for search engine B. You can form an opinion on the basis of ignorance and misinformation — many people do — but to be competitive in this industry you must discipline yourself to look for the answers you need for each search engine within the data the search engine will share with you.

Every search engine tells you which of your pages it fetches, how often it fetches those pages, where it fetches those pages from, and if those pages appear in its search results. That information has to serve as the foundation of your link flow analysis. You need to know if your internal links help get your site crawled and indexed faster than external links; you need to know if your internal links help pass anchor text to your own pages; you need to know if you can influence the rate of crawling and caching for any given page on your site by adding or dropping crawlable links.

Knowing how much PageRank-like value your links pass won’t tell you anything about whether a page is likely to rank for any particular query. However, knowing that you can influence search engines to update their databases for any particular document within a specific timeframe empowers you. You need to settle upon your own definition of link flow.

But, more importantly, you also need to develop your own analytics to help you evaluate what the search engines are telling you. The SEO Brand X search tools cannot do that for you. They are not designed to offer proper analysis based solely on what each search engine discloses about itself.

SEO Tool designers consistently fail at these kinds of projects because they don’t understand why crossing the data streams doesn’t work.

{ 7 comments… read them below or add one }

Kev 10.13.08 at 9:40 am

I’ve been waiting on your comments about the new Linkscape feature (over at Moz) since, as a marketer with an education in Psychology (a field notorious for it’s reliability and validity issues), I’m always trying to find the confounding variables that will destroy an experiment’s credibility. You often point out those confounding variables in your posts and so I’m curious if this post is a veiled analysis on the new tool. In my opinion, Linkscape offers nothing more to users than the PR Toolbar, since it can’t even claim to have the same pages indexed, let alone claim to use the same metrics Google (or anyone else) is using to value links. A lot of the issues I had thought of regarding Linkscape’s dubious usefulness were covered in your post.

Michael Martinez 10.13.08 at 10:18 am

The points I raise in this post would have been equally valid before SEOmoz released Linkscape, but it was Linkscape’s announcement that inspired me to write about the topic. I scheduled the post for this week to minimize the chance of people reducing the discussion to petty bickering.

I’ve been reluctant to say much about Linkscape itself so far because it’s a brand new tool, and it takes time to evaluate any tool. I suspected Rand was rolling out a link scraper several months ago. He loves links and bases his Web marketing on the link angle, so it makes sense that he would develop tools to help him evaluate sites’ linking structures and linking profiles.

Understanding that we all have a bias (I’m a content-based SEO) in our philosophies, you can see the bias in Rand’s philosophy in his review of the Ask the Search Engines panel at SMX East 2008. Contrast Rand’s “Links Are Still Huge” paragraph with Virginia Nussey’s report of the same question:

Rand: “When asked if links are the primary signal for search engine rankings, the engineers generally agreed that, yes, it probably is. Aaron noted that links are a far less noisy signal than many others, including some forms of on-page keyword use and clicks in the SERPs. Sean from Yahoo! said that while it may not be the “most important signal” by itself, it’s more important than, for example, title tags (which SEOs generally agree are critical to the SEO process). There was no mention that links would be fading away anytime soon – or that any competing signals had yet entered the marketplace as a potential usurper”

Virginia: “Aaron says links are a good measure of reputation. Clicks are a noisy signal, and so the absence of a click for a result is thus way more useful because it signals that it’s not the most relevant result. Sean isn’t sure if links are the most important signal or not, but he will say that it’s a larger signal than Title tags, for instance.”

These are two completely different takes on the answer to the question “Are links still the primary signal for popularity and importance?”.

As for Linkscape, I have only made two public points so far:

1) I have cautioned people not to mistake it for a tool that provides insight into what Ask, Google, Live, and Yahoo! know about links or decide about links. You get absolutely no information whatsover from Linkscape on which links are indexed and/or passing value in the major search engines.

However, given the SEO community’s history of misusing Yahoo!’s link data to analyze Google search results, I feel confident all the wasted energy will shift over to Linkscape because it provides more detailed information.

2) I have reminded people that LINKS ARE NOT ENDORSEMENTS. I realize Nick Gerner is proud of his work, but he reveals considerable naivete with his frequent “links are endorsements” comments. That kind of nonsense is nothing more than search engine propaganda. People don’t use links as endorsements. We use links as references and connection points.

Endorsements are expressed through statements, not through hypertext markup language. In a comment on Rand’s “6 Lessons” post, Nick wrote: “The affiliate links have to be quality endorsements, not junk links in some footer or hidden div.”

Sorry, Nick. That statement is utter rubbish. I’ve been working with affiliate programs since the late 1990s. The vast majority of affiliate links are placed openly throughout content for the sole purpose of generating revenue. Affiliate links are about the money, not endorsements.

I’m not surprised to hear that affiliate links may pass value in some cases. There are many different ways to build affiliate relationships. In most cases, there is little potential value to be gained from affiliate links anyway.

But everyone needs to remember that links are not endorsements according to the U.S. government, and they certainly are not endorsements when *I* place them. Anyone who says otherwise has a lot to learn about the psychology of linking.

Kev 10.14.08 at 8:21 am

I agree that links are not endorsements. But he simply speaks using the MOZ canon (if Rand was my boss I’d probably use the term like that as well). The internal glossary they have equates links to endorsements, and I personally can’t fault Nick for that.

And I feel that they use this term because it is the most concise word they can think of to mean the concept they are trying to convey, which is that search engines use links as one of the metrics for determining where pages will be ranked in the results pages.

if links are being used to determine a page’s quality
and endorsements are used to prove an item’s quality
then, in simplified, layman’s terms, a link is an endorsement.

I think “links are endorsements”, which is not technically correct, will reach a much wider audience then “links are one metric that a search engine may use to determine where a page should be positioned in the search results pages”. There is eloquence in the the shorter statement, which may be necessary for their core demographic (I would imagine many beginners and novice SEOs read their blog).

It’s not perfect, but it makes enough sense to use it as a building block for people trying to understand the basics. I just hope that once the basics are understood, MOZ users will break off to learn from their own experiences and from more advanced sources.

I’m much more concerned with a tool, which many people will pay monthly to have complete access to, that won’t provide the intended functionality.

Michael Martinez 10.14.08 at 9:26 am

Most links are NOT being used to determine a page’s quality. That lesson just doesn’t seem to sink in with the SEO community.

Maybe 6 years ago that was a different matter. But a lot of algorithmic filters and modifications have been applied to the Web since 2002. So teaching novice SEOs that links are endorsements is a huge disservice to that community.

As far as the tool goes, it will survive or fail on the basis of its own market-perceived value.

Traffics Pain 10.14.08 at 12:17 pm

Basdically all they are really doing is building thier own ‘paid’ (and relatively expensive) search engine.

I dont really see the huge benefits, It talks as tho it will pick out all these diamonds and say this site page is great get a link on it and you rank stronger for that anchor text or something.

In reality, anyone involved in seo of any sort is most likely aware of what websites, portals etc would be useful for thier individual website to aquire advertising or links on or partnerships with anyway right? And even if these super duper pages are found then hey, you still have the same problem s the other 999 people that requested a link here had this week, namely actually aquiring a link or endorsment etc anyway.

I mentioned over there also that a lot of this thinking was simply conjecture. They cannot decide which links pass ‘juice’ because big G for example doesnt not tell you if a site or page is not passing it right? Even if it a .edu pr6 blahh blahh there is simply no way you know if it is passing any of this juice

Even if it does provide useful clues, and dont get me wrong it probably does offer valuable data if you know what do do with that data, then Google n co can simply change thier algorithms tomorrow and turn it all upside down on its head again.

I wont be paying for it myself, I like to use several tools for different purposes and keep up to date with different search ideas etc but very rarely would I expect to pay for it. Unfortunately I see 90% of website owners wont be paying for it either as this many at least are usually one man bands and family businesses etc.

Because of this im a bit pissed off that they released something b4 releasing the user agent so that I can prevent them basically spying on me! Then they can sell this info to my competitors. Not good! In fact I think there should be rules about this sort of thing of there isnt already?! If I found out anyone in the offline world sold my personal information without me being able to opt into this and they were local I would kick ten tonnes of shit out of them.

ericward 10.17.08 at 7:07 am

Micheal – thank you for this. Over the years I have lost count of the 3rd party tools I’ve tried out. What is most telling to me is that I abandon them when it comes time for heavy lifting deep vertical link target ID and evaluation. I wont go so far as to say “All you need is Google and your brain”, but it’s darn close to true, at least for the type of client content I work with. Linkscape is outstanding and useful for a very specific set of metrics and measures, and for a certain type of link builder is a must-have. I commend Rand for it and will use it. On the other hand, as much as I want and look forward to every new tool, I keep thinking about Rocky IV, where Ivan Drago was using every cutting edge tool and training method available, while Rocky ran around in the snow with a log on his back. The saaviest link builders will use tools *and* logs.

seopro 02.18.09 at 12:34 pm

ericward is right, when it comes down to it, it takes a skilled technician to understand proper link flow balance. linkscape is definitely cool, although seoeng is really what most of us use now. i’m sure eventually most of the tools will be like seoeng in the future, but it still comes down to having an experienced technician at your side :)