Crawling Intent, Hidden Diagram
Posted by Michael Martinez on August 20, 2007 in SEO Theory
Two current search engine optimiation philosophies are about to clash in the various SEO communities. You can already find a number of blog posts and forum discussions (and even at least one eBook) where people disclose that they use “rel=’nofollow’” on their internal links (and some experts are advising people to do this).
The misguided belief behind use of “rel=’nofollow’” for some internal links is that you don’t want certain pages to appear in search results. That is, some businesses put more information about who they are and what they do on their “About us” and “Contact Us” pages than they do on their actual product and service pages. Hence, “About Us” and “Contact Us” pages often rank better than home pages for company brands.
The unthinking knee-jerk reaction is to assume that these are unimportant pages that should not appear in search results, and therefore they can be blocked with robots.txt and “rel=’nofollow’”. I cannot imagine a more stupid, short-sighted approach to managing internal content, but perhaps someone has come up with an even dumber idea and I have yet to find it.
On the other hand, Google has now made accumulation of PageRank even more important than before. So as PageRank strategists crawl out from the woodwork like the SEO cockroaches they are, sensing that their day has come, they’ll be sharpening their barbs and insults in preparation for the final showdown with common sense and effective search engine optimization strategists.
In ideal search engine optimization, every page you place on your Web site should rank for something. No page can rank for anything if you tell the search engines not to index it. Sacrificing search visibility because you’re too lazy to make your home page, product pages, service, pages, FAQs and tutorials, and other “targeted” content more relevant for your keywords doesn’t cover up your incompetence as an SEO. If anything, it only underscores the fact that you don’t know what you are doing.
If people want to know more about your company, the ideal place for your “more about our company” content is your “About Us” page or section. And people do perform searches with the intention of learning more about companies (and individuals), rather than with the intention of finding their products and services.
But from a search indexing point of view, every additional page you place on your site should help you in several ways. For one thing, it’s another opportunity to boost your PageRank. Just because Matt Cutts says “every site gets only so much PageRank and it only goes so far” doesn’t mean that keeping your “About Us” and “Contact Us” pages search visible directs your PageRank toward the wrong places. It just means that getting PageRank to older, deeper content with fewer links pointing at it becomes more difficult.
An “About Us” or “Contact Us” page is far less likely (except in a CMS) to have an outbound link. If you know what you’re doing, all the links on your informational pages will be internal links — meaning whatever PageRank those pages accrue will be directed toward other parts of your site. But if you tell the crawlers these pages are not important, untrusted, or should not be crawled then you are wasting PageRank, literally throwing it away. After all, many a new site has earned PageRank simply by being indexed after an XML sitemap was submitted to a search engine.
The internal link anchor text that “About Us” and “Contact Us” pages can send toward other pages also helps your site. There is, in fact, no reason for why an “About Us” page should appear first in the search engine results if you design your navigational anchor text properly. And that doesn’t mean you should use anchor text of “About Us” to point to an “About Us” page. It does mean, however, that if you use “Home” to link to your root URL you deserve to have your “About Us” page show up first in the search results.
Using “rel=’nofollow’” on internal links won’t guarantee that internal pages aren’t crawled and indexed anyway. After all, someone else could create a directory of “About Us” pages or innocently link to your “About Us” section because it provides relevant information on you and your company. So your “rel=’nofollow’” is wasted (it won’t conserve PageRank the way you may think it should). And if you have disallowed the page through robots.txt you’ve really shot yourself in the foot.
There are four pages that every site with more than 10 pages of content should be promoting from all pages: the root URL (using the site name or company brand), the “Contact Us” page (using the anchor text of “Contact Us” is okay), the “About Us” page (using the company name or individual name), and the HTML sitemap. In addition, every page on the site should be linking to one or more section root pages (if you have logical sections). The home page should be featuring deeper content that is current and important.
The “About Us” or “Contact Us” pages should appear second in the search results below the root URL. Where possible, search engines try to favor root URLs as long as they are relevant to the queries that bring them up. If the query is for your company name or site brand, you just need to make sure you cover your basis with your title tag, on-page copy, and internal link anchor text.
Unless you use the word “home” as anchor text for the root URL, it’s very difficult to screw up making your home page the most relevant page for your name. You’d have to be really, really stupid to set up a Web site in such a way that you could only make the home page more relevant than your “About Us” page by using “rel=’nofollow’” or disallowing the page in robots.txt. If you think you cannot get the home page to rank, you need to swallow your pride and fix your on-page content and internal anchor text.
A Web site should easily be divided into three sections: high-level content (usually only the root URL for the site and for each section), functional resources (like search tools, HTML sitemaps, archive directories, About Us and Contact Us pages, etc.), and deep content. Deep content doesn’t have to be lost in the Supplemental Results Index. All you have to do is point a few links at it so the crawlers keep finding it. You don’t need tons of PageRank.
PageRank is the discriminating tool Google currently uses to separate its Web indexes. You do need some PageRank to ensure your pages get into the Main Web Index, but keep in mind that being included in that index doesn’t guarantee that your pages will rank or pass value. You need to avoid tripping filters and penalties.
In the meantime, as PageRank hoarders start shouting “Yeeha! Git along there little PageRank dogie!” more loudly, you should step aside and let them have their PageRank Pride March. They can’t affect the search results nearly as well as someone who focuses on the fundamental principles of search engine optimization.
Don’t lose sight of what you need to do with your site. Use your anchor text wisely, build efficient internal navigation, create good organization and structure, and you won’t be tempted to do something stupid like pretend your “About Us” and “Contact Us” pages don’t exist.
3 Comments on Crawling Intent, Hidden Diagram
By Kneek on August 20, 2007 at 9:07 pm
“It does mean, however, that if you use “Home†to link to your root URL you deserve to have your “About Us†page show up first in the search results.”
It seems the Seo Theory Blog doesn’t get what it deserves…
By Michael Martinez on August 21, 2007 at 5:12 am
LOL!
This blog breaks all the rules and gets away with plenty. But that is what SEO theory is all about: experiment, evaluate, adjust.
Of course, there are other links on the blog that point home with the “SEO theory” anchor text. So we’re good.
By Darren McLaughlin on August 22, 2007 at 3:02 pm
I honestly think most people are overthinking the nofollow tag and attempting to be too clever with its’ use.
BTW, it took me a second to get the Title, but it’s a sly reference to one great movie.
Comment
Log in or Register to post a comment.