The PageRank control myth and the nofollow-for-SEO myth

by Michael Martinez on August 31, 2007

Where does your PageRank come from?

Does the little PageRank fairy visit your links at night and sprinkle PageRank dust on them? Do other people go buy it from authorized PageRank resellers and then distribute it to their friends?

There are a couple of major voices in the industry spreading some absolute nonsense about PageRank and “nofollow” right now, and though there is always someone spreading absolute nonsense about PageRank, we have a rare opportunity to see the birth of two SEO myths.


SEO Myth: You can control the flow of PageRank on your site


Some people claim that if you restrict Google’s ability to find Web pages on your site through your internal linkage tha you can concentrate the flow of PageRank to your most important pages.

Fact: Your PageRank is influenced by four factors:

  1. What you do with your pages
  2. What other people do with their pages
  3. What Google does with its filters
  4. Time

There is no “PageRank aging factor” but PageRank does change over time, and there are also inherent lag times between the placement of links on pages and the receipt of PageRank from those links. The more pages that PageRank has to pass through to reach any particular page, the more time that PageRank needs to reach the page.

PageRank seeks a mathematical equilibrium, not where everything is equal but rather where everything is weighted according to the probabillity that a link path will lead a random surfer to any particular page. At best the search engines hope to create a model that approximates the distribution of PageRank across the Web.

In reality, search engines radically reduce the accuracy of their PageRank models by stripping pages from their indexes, by preventing pages from conferring PageRank, and by arbitrarily reducing pages’ PageRank to zero. The more tweaks, filters, and penalties that are introduced into the model, the less accurate the model becomes.

Fact: PageRank begins with your pages. Various technical papers have confirmed that you have to assign a starting value to every page in an index over which you intend to calculate PageRank. It is assumed that the calculating process manages the blocking of PageRank appropriately.

So when Google sits down to calculate PageRank for 20 billion pages, you automatically get a certain amount of that PageRank. It depends on how many of your pages Google includes in the data set. Even if Google has crawled all of your pages, it may only include a portion of them. So let’s say that every time Google calculates PageRank you get X pages in the data set. The sum of your starting PageRank is X divided by 20 billion, but each of your pages has the same starting PageRank: 1 divided by X divided by 20 billion.

Fact: PageRank is an estimate, not a specific value. A lot of people struggle with the concept of PageRank because the Google Toolbar has trained us to think of PageRank as a value that persists over lengths of time. In reality, PageRank changes for all Web pages every time someone deletes a Web page, deletes a link, creates a Web page, or creates a link. That is, the probability distribution is constantly being altered by the changes, additions, and deletions that people make to their content.

So the question is, can you really control the flow of PageRank on your Web site? The answer is an unqualified definitive “no”. You cannot control the flow of PageRank on your Web site but you can control your own internal link flow. Link flow is not PageRank. Link flow is comprised of the pathways you build between your pages.

Your link flow does influence your PageRank but it also influences your crawling and indexing. Telling search engines that “these links are not for you” in no way helps your optimization. It only makes it harder for the pages you deem to be untrustworthy to get into the index.

Can you or should you use “nofollow” on only some of your links? Google has in the past advised people to use “rel=’nofollow’” on links pointing out to other sites’ pages on the assumption that you cannot be sure of what those links are pointing to. After all, “rel=’nofollow’” was supposed to save bloggers from link spam. Spammers would supposedly get the message that their links would not help their search visibility.

The inherent flaw in this strategy, however, is that AdSense spammers don’t need to be indexed, much less rank in search results, as long as their pages get traffic. That’s all an AdSense spammer cares about. As long as Google pays the spammer, he’ll continue dropping links regardless of whether they pass value in Google’s index. So “rel=’nofollow’” doesn’t hurt spammers.

You can exclude sections of your Web site from indexing through “robots.txt” or the “robots” meta tag, and there are plenty of good reasons to exclude portions of Web sites from indexing. But you cannot use “rel=’nofollow’” to ensure exclusion from the index because those pages may be found through other links.

Some people therefore conclude that it’s safe to use “rel=’nofollow’” on internal links because it restricts the flow of PageRank and doesn’t prevent other people from linking to the nofollowed pages. The problem, however, is that you need to get your pages indexed and you cannot count on other people to help you. So some people suggest you channel your PageRank through your more important pages and let it filter down from them.

That’s like saying you’ll get more ice cubes if you fill an ice tray with water by only directing the water at two slots on one end and letting the water flow down to the other end. You don’t end up with any more slots in your tray, hence you don’t get any more ice cubes, but it does take longer for water to reach those far slots (note: in a perfect model with no spillage, it would not take any longer to fill the ice tray no matter how you bring the water into the tray).

Crawl-to-index lag time is the only consideration you have to make when thinking about how you want your PageRank to roll through your site. If you submit an XML sitemap and get an entirely new site crawled at the same time, most small sites will be indexed within a matter of days. There is no benefit to using “rel=’nofollow’”.

If you submit a tree of XML sitemaps for a very large content site the URLs to be crawled and indexed first should be the first URLs in your sitemap files. You can (if you place all your sitemaps in your root directory) include a page in more than one sitemap file, thus increasing the chance it is crawled more often than others. Hence, there is no benefit to using “rel=’nofollow’”.

If you have an existing site that is only partially indexed and you elect not to create an XML sitemap (or if you have no success with XML sitemaps), you have insufficient PageRank to ensure that every page will be included in the index. You can go get more links, hopefully from PageRank-passing pages, and ensure they point at all your missing pages. That takes time and it is not guaranteed to work. Or else you can use “rel=’nofollow’” to reduce the amount of PageRank outflow from some pages so that they all send it to your HTML sitemap page(s). That takes time for the crawlers to get in and find the pages and it is not guaranteed to work. Hence, there is no benefit to using “rel=’nofollow’”.


SEO Myth: Rel=’nofollow’ can be used for search engine optimization


We throw “SEO” around like a word these days, and in some ways it has become a word. But it also remains an acronym for “search engine optimization” and optimization refers to the process of modifying Web pages so that they achieve optimum visibility in search results.

Can you use “rel=’nofollow’” to optimize for search? No.

Can you use “rel=’nofollow’” to optimize for anything? Yes.

You can optimize your site’s ability to avoid being filtered or penalized in search engines. That is, if you create a Web site where people are allowed to create profile pages and link out from those profile pages, the odds are pretty good that some people will abuse the privilege and link to skanky Web sites. Search engine algorithms may — if enough skanky links show up on your pages — algorithmically strip your pages of the ability to pass link anchor text and PageRank.

Let’s say you operate multiple Web sites and you use your Web sites to help promote each other. If one site suddenly loses the ability to pass link anchor text and PageRank, you could find yourself in serious trouble in the search results. That’s not guaranteed, but it could happen.

So you want to optimize your sites for filter thresholds. That’s not search engine optimization, that’s search filter optimization. The distinction may be miniscule to some people but simply avoiding filters has nothing to do with how well your content will rank. Search engine optimization assumes no filters or penalties exist. You’re optimizing unfiltered, unpenalized content.

Search engine optimization is the art of designing or modifying Web pages to rank well in search engines.

Can you influence your rankings in search results by telling the search engines not to follow links to your own internal content? That is, if you do your best to mathematically push all your PageRank toward crawling only the most important pages on your site, will that improve their ability to rank in search results?

A little bit. Not much. Remember, if Google has assigned PageRank to 20 billion pages, you only get as much starting PageRank as you have pages in that 20 billion. If you have a 1-million page site and you use “rel=’nofollow’” on most of your internal links, you’re not channeling the PageRank of 1 million pages toward your most important pages. You’re only channeling the PageRank of some subset of those 1 million pages toward your most important pages.

How large is the subset of pages you have to work with? That really depends on how many of your pages receive PageRank from external sources. Remember, if you leave any path open to the rest of your content, your PageRank will flow down to all of your pages and will eventually be dispersed across the rest of the Web (it goes out to the rest of the Web anyway but that is another story).

So if you’re competing in search results on the basis of PageRank, you need to have a lot of pages, a lot of external links pointing to those pages, and an ironclad “rel=’nofollow’” navigation structure that prevents your PageRank from flowing throughout your site.

Then you’ll be in a position to compete on the basis of PageRank rather than on the basis of relevance (and that assumes that PageRank alone can help you with any query for which your content is at least marginally relevant). How many of you have sites with 1 million or more pages? How many of you have sites with 100,000 or more pages?

How many of your pages are presently in the index?

How many of those pages have value-passing external links pointing at them?

How much work do you have ahead of you if you want to compete in search results on the basis of PageRank?

The realist accepts that “rel=’nofollow’” isn’t going to help with search engine optimization. It won’t help with gettign pages indexed. It won’t help with search engine rankings.

So the bottom line is that even if you could “control the flow of your PageRank” you wouldn’t accomplish much by doing so. At best you can manage the flow of your PageRank and the best way to manage the flow of your PageRank is to help it flow throughout your site. You can improve linkage and build a strong foundation with your starting PageRank.

That’s a whole lot easier than waiting for the PageRank fairy to sprinkle green dust on the pages you tell the search engines you don’t trust.

See also:
Matt Cutts, Michael Martinez, PageRank, and Link Flow

How to screw your Web site with nofollow

{ 8 comments… read them below or add one }

dodito 09.05.07 at 9:10 am

I have noticed that one of our pages with ONE rel=nofollow link pointed to it, and ONE internal link (from a page that may even still be in “supplemental” whatever that may mean exactly right now) so if not orphan than pretty close to it, ranks in the top 10 for some combination of keywords that are very competitive. It was also not a single topic page (so 30 times the keyword combination on it).. it WAS relevant however to the keyword combination.

I am not so sure if pagerank has all that much to do with ranking well (apart from the whole supplemental index issue..) However I do know.. that (unless you know how to be a good spammer) a lot of common sense approaches: who links to you, how, where, and why, etc.. as well as internal linking as well as content.. should get you pretty far. Are the guys that link to you picky or not etc. You know.. I would even take a rel=nofollow link in gratitude.

Frankly, we don’t even have the time nor the resources to try to “tweek” the results like that (as in rel=nofollow on our own page to “steer pagerank”), or every time invent a new scheme or a new model to adjust to Google.

Personally I prefer to add some more value to the site.. and have people notice and link to us.. deep, not deep, their anchor text, or our own anchor text whatever makes them happy, our visitors happy, and our partners happy, will make us happy.

In fact it makes life simple.. I see SO many pages (also .edu and .gov) sites since july not being cached anymore by google, that I have a feeling, things have become really tight in the useful link building area, which forces you to “start “at home” first” even more, both from the perspective of having a very solid site, and from the perspective that a solid site would receive some links from pages/people that are really picky and “matter”..

More fun.. and.. we just got a link from several university libraries, about.com and some bloggers.

Justin-Goldberg 09.06.07 at 9:13 pm

dodito, as hittail has proven to me, your site was already in line to rank for that term. Every site has it’s own keyword stream that it can rank for.

Plus noone in the world can see every link, from the public search engine data. I wonder about the google webmaster control panel, does it show 100% of all links?

Michael Martinez 09.06.07 at 10:26 pm

I wonder about the google webmaster control panel, does it show 100% of all links?

Googlers say it won’t report all links. They also say that it won’t show you which links pass value and which links don’t pass value.

brill 10.25.07 at 6:06 am

I’m working on a site that has 2 global navigations… I was going to nofollow the left nav and have the dhtml top menu be the focus for google. I don’t want to duplicate the links but the website owner feels the left nav helps with usability. Would this be an instance where it (nofollow) would be applicable?

Michael Martinez 10.25.07 at 7:11 am

brill: “I’m working on a site that has 2 global navigations… I was going to nofollow the left nav and have the dhtml top menu be the focus for google. I don’t want to duplicate the links but the website owner feels the left nav helps with usability. Would this be an instance where it (nofollow) would be applicable?”

Michael: If Google sees two links to the same destination on one page, it should treat them as if they are one link. So using nofollow won’t preserve any PageRank and doing what the client wants is not going to hurt him.

I can only think of one type of page I would use nofollow on, and that is only because Matt Cutts pointed out problems that can be caused in a post on his blog well over a year ago. If you have a calendar application that contains hundreds or thousands of empty pages, while it might seem like a good idea to let search engines crawl and index those virtual pages, it would probably cause more trouble than it’s worth.

I don’t use nofollow on internal links. There is no search optimization benefit for doing so despite all the sham advice being given out to the contrary. People have proposed the use of nofollow as a solution for problems that are created by poor internal linkage.

Screwing up your internal linkage even more isn’t going to fix it.

seo 07.04.08 at 2:17 pm

I really used to be against the no follow if you think your webpage is not important then why have it there in the first place?

But after reading Google webmaster guidelines my option has changed and it will affect your PageRank according to Google, also if it’s a paid link it should have a no follow.
I know things have changed since this article was written.

Michael Martinez 07.05.08 at 10:34 am

Using “rel=’nofollow’” on your internal links as you propose remains a very bad idea. No one in the SEO community is able to execute it well enough to have any positive effect on their PageRank (which is not critical to search engine results ranking success). The Google Webmaster Guidelines on NoFollow offer absolutely no justification for shooting yourself in the foot by nofollowing important internal pages or trying to sculpt PageRank.

Googlers have repeatedly come out on record over the past few months advising people NOT to try this.

seo 07.06.08 at 3:51 am

I would agree and it isn’t something we do on our website or any of our clients, and yes PageRank really isn’t that of an important factor but so many poeple do