Managing crawl versus managing link flow
Posted by Michael Martinez on January 28, 2008 in Advanced SEO
As I have said before, Link flow is the pathway that links forge throughout your Web site or network. People confuse link flow with PageRank because PageRank is determined (in part) by link flow. But there is another factor that determines PageRank: crawl.
Crawl is both what a search engine does when it fetches pages and the pathways it finds on the pages it has fetched. That is, crawl is the scope of a search engine’s crawling as determined by its capabilities and the pages it actually crawls (fetches and extracts URLs from). This is an abstract concept that you should be careful NOT to confuse with crawling (where search engines fetch pages).
We limit the scope of crawl in several ways. For example, we may break links unintentionally, sending any potential crawl into the 404 bit bucket. Or we may orphan links and place them on sites that don’t get crawled (this actually happens more often than people realize). When you’re discussing crawl (versus crawling), links may be orphaned even if they are on well-linked Web sites. Why? Because there is no crawling activity.
Crawl cannot exist without the crawling or the pages. Crawl is the abstract resource you want to manage in order to ensure that you get your pages indexed as often as you need to have them indexed.
Crawl is time-sensitive but you can be flexible in defining what your crawl window should be. A crawl window is the period of time in which a page should be fetched at least once, preferably twice. Why twice? Because if a page is fetched only once within a crawl window you have to compare page fetches in consecutive crawl windows in order to determine if you have sufficient crawl to keep your page fresh in search indexes. If you define a crawl window that requires two page fetches then you at least know you have some active crawling.
You want to manage crawl for new sites and old sites, large sites and small sites. Crawl does not directly determine whether your pages will be indexed or which index they’ll be included in. Crawl simply assures you that search engines are fetching your pages. A good crawl scope ensures that a significant percentage of a Web site’s pages are fetched within the crawl window.
There is no ideal or perfect or one-size-fits-all crawl window or crawl scope. Here are a few examples to give you an idea of how to measure crawl.
Crawl Window
- 1 Week - A page that changes daily may have a 1-week crawl window. You probably want to see at least 2 search index updates for the page per crawl window.
- 2 Weeks - A page that changes weekly may have a 2-week crawl window. You want to see at least 1 search index update for the page per crawl window.
- 1 Month - A page that changes seldom may have a 1-month crawl window if it is well-linked. Since the page doesn’t change all you’re really looking for (if the data is available) is a change in cache dates.
Crawl Scope
- Small site, less than 10 pages. You want a Large Crawl Scope of greater than 50%. Crawl Scope = percentage of pages indexed within a crawl window.
- Moderate site, less than 1000 pages. You want a Moderate Crawl Scope of 30-40%. That doesn’t mean you’re site is in trouble if you don’t see 30-40%. You may have defined too small a crawl window. Rule of thumb: The larger the site, the less likely any one crawl window will be suitable for the entire site.
- Substantial site, 1,000-20,000 pages. You want a Crawl Scope around 10-20%. Most of your pages will probably be on the fringe. If you have a high Crawl Scope, you may have defined your crawl window to be too large. For a site this size, I would use multiple crawl windows and measure separate crawl scopes.
- Supersite, more than 20,000 pages. A reasonable Crawl Scope would be in the range of about 1-5% (larger sites should have smaller percentages). Again, you really want to work with multiple Crawl Scopes but a large site that has an overall Crawl Scope of less than 1% in most typical windows is probably in trouble.
You can leverage the content on your site to manage your crawl. The more crawling activity you see come through your pages, the more influence you have over your crawl. This is the advantage of opening up your internal navigation and providing as many alternate routes to your pages as possible.
If you start measuring your crawl with fairly conservative metrics — low percentages and long windows — you’ll find that you can refine your metrics and improve your crawl strategy incrementally. Looking at crawl gives you a place to start, especially if you measure multiple crawl scopes within your Web site. Knowing how often each page is indexed tells you where you can find your most-likely-to-be your best-performing links.
When you are launching a substantial new section of your site, or when you use one or more older sites to help launch a new site, knowing which of your pages have good crawl value gives you an advantage. You don’t have to wait for someone else’s pages to be crawled and indexed in order to see your own pages start working.
I’ve been managing crawl for my personal network for years. It’s an advanced concept in SEO and most people are not used to thinking of crawl as an abstract concept. Crawl is a noun, not a verb, in this world view. You manage the resource by making small but substantial changes to help search engines find your pages on your timetable rather than according to random chance.
You don’t want to be formulaic in managing crawl. You don’t want to use the latest linking tips and methods. To manage crawl you need to be slow and methodical. By taking the longer view and learning to understand how your pages interact with search engines, you’ll find that you need fewer links because your links will work fast and achieved your desired results quickly.
You’ll no longer be completely at the mercy of other people’s linking priorities.
There is more to say on this topic, especially where the differences between crawl and link flow are concerned. If you feel tempted to argue that crawl and link flow are really the same thing or are very closely related, be patient. They are two totally distinct unrelated concepts.
Comment
Log in or Register to post a comment.