SEO Algorithm Roundup: AOL, Ask, Google, Live, & Yahoo!

by Michael Martinez on April 22, 2007

How well do you know the search engine algorithms? Most SEOs would probably fail a simple test covering the basics. Algorithm chasing is not highly regarded in the industry. Here are some pros and cons for algorithm chasing, followed by a roundup of basic algorithmic facts for the 5 major search services (AOL, Ask, Google, Live, and Yahoo!).


Why Algorithm Chasing Wastes Time

  • Search engines change their algorithms several times a year
  • Each search engine has its own algorithm
  • Only Google’s algorithms are very well documented
  • You have to read a lot of technical papers and patent applications
  • Most SEOs don’t know enough about programming, text-parsing, or indexing data to be able to reverse-engineer algorithms


Why Algorithm Chasing Helps SEO

  • Algorithm analysis explains why perfectly good Web pages don’t rank in search results
  • Algorithm analysis confirms the do’s and don’ts of SEO fundamentals
  • Algorithm analysis helps you understand why black hat spammers do what they do
  • Algorithm analysis vindicates the search engine optimization model


The All-Time Worst Algorithm Chasers


Name every person who writes about search algorithms, SEO theory, or claims to know any better than anyone else how search engines work. That pretty much includes the entire visible spectrum of the SEO industry. If you have blogged about it, shared your thoughts about it in a forum or at a conference, you’re on the list. Get over it. Your favorite SEO bloggers, forum admins, and SEO journalists top the list with the most demerits.

The AOL Algorithm


America Online does not collect its own data. Instead, it uses Google’s database. For many queries, AOL results look exactly like Google’s results, but AOL implements a much stricter adult-content filter. Some queries therefore look much leaner than Google’s versions in Google SafeSearch mode.

AOL’s Full View offers the first well-done integration of Web, Video, and Image search (combined with Search History) in a user interface. Their Image and Video selection criteria were recently updated. I think they may have implemented some sort of copyright-guessing algorithm to minimize exposure of sites that probably lift their graphical content from other sources.


Ask Web Search


Ask’s ExpertRank algorithm is without question the best at organizing data and filtering out untrustworthy sites. Their presentation sucks as they hand-edit the results to favor untrustworthy sources like Wikipedia.

ExpertRank is based on Jon Kleinberg’s HITS algorithm, combined with the IBM update to HITS called CLEVER. ExpertRank — like all Web index algorithms — looks at on-page factors to discern relevance but first selects pages from a pool of “trusted” sources. Ask’s trust is placed in categorization that is determined in part by “hubs” (pages that list expert pages about a topic) and “experts” (pages that deal mostly with specific topics and which are cited by other pages that deal with similar topics).

Ask’s algorithm relies heavily on citation analysis that goes far beyond classic PageRank. The drawback for the Ask algorithm is that it filters out a lot of otherwise worthwhile content and their database thus looks smaller than it actually is. I could speculate this may be in part why they have decided to develop the Edison algorithm. In my personal opinion, I think Edison will prove to be disastrous for Ask.
How you get indexed in Ask: You need links from pages already showing in Ask’s results for your topics. Don’t confuse “hubs” with directories.
Important on-page relevance signals: Title, keywords meta tag, use of bold/strong, italics/emphasis, repetition of keywords in text.


Ask Blog Search


Blog search is a misnomer. Ask, like the other major search services, relies on RSS/XML feeds to identify listings in blogs. This means that any forum that publishes an RSS feed may show up in Ask Blogs & Feeds. Any static RSS feed may also show up in Ask Blogs & Feeds.
How long does it take to appear?I have seen my own feeds update in Ask in 24 hours.
Is there is a time limit?None that I have seen. Some of the content is pretty stale.
Weighting factors: Age of post, keywords in title, repetition of keywords in text

Google Web Search


Google has always ranked Web search results on the basis of Relevance + PageRank. For most queries, including most commercial, hyperoptimized queries, PageRank usually makes no difference. It is a very rare query that is noticeably affected by PageRank, which has nothing to do with Relevance.

Google determines relevance by applying filters and penalties (to eliminate untrusted or spammy sites); scoring on-page factors like use of keywords in title, URL, outbound link anchor text, and use of bold/strong, emphasis/italics, color, and page structural elements such as bulleted lists, Hx headers, etc. (anything that makes a word stand out from the rest of the text on the page); scoring off-page factors such as inbound link anchor text and words immediately preceding and following inbound link anchor text; scoring for repetition.

Google claims to take more than 100 factors into consideration. Most SEOs only use 2-3 factors: inbound link anchor text, keywords in title, keywords in URL. Through the years, Google has changed the way it weights factors. There is absolutely no evidence showing that they weight by location (what domain a page is found on) but PageRank can influence several things.

A page with high internal PageRank (an undisclosed value between 0 and 1) will be crawled more frequently, included in the Main Web index, and obviously has more PageRank to share with other pages.

A page with low internal PageRank may be listed only in the Supplemental Results index, won’t be parsed (won’t rank for on-page factors), and may only appear in search results as a result of inbound link anchor text.

Google’s SafeSearch mode prevents Google from showing “URL-only listings”, pages that Google knows about but hasn’t crawled and indexed. These pages can show in search results ahead of Supplemental Results on the basis of keywords in URL.
The most important factor for relevance: Repetition of keyword, either on-page or off-page.
How do you get indexed in Google?: New domains are crawled as Google becomes aware of them; XML sitemaps submitted through Webmaster Central; through links on sites in the Main Web Index that have not been stripped of their ability to pass PageRank.
What is the Sandbox Effect all about?: It’s about Trust, not TrustRank (see Yahoo!). Google measures more than one kind of trust. If they quantify trust, they have not disclosed how they do so. A trusted domain can usually get new content to rank quickly. A new domain can usually earn trust within 2-4 weeks depending on who links to it. Any new domain that doesn’t earn trust is “sandboxed”. Trust is only earned through inbound links.


Google Blogsearch


Like Ask’s Blogs & Feeds, Google Blogsearch is built from RSS/XML feeds and includes many forum posts as well as static RSS listings. Google’s Blogsearch defaults to a relevance-based listing format that favors well-linked listings. You can switch over to a time-based listing that shows most recent posts first.

Blogs (or forums) that are heavily linked to with keywords will be featured above the normal results as “Related Blogs”.
Which factors affect relevance?: Title tags seem to have the most impact, but inbound link anchor text also seems to play a part. Having a blog on Blogspot also helps.
How long to get indexed?: If your blog is on Blogspot, it takes about 3-10 seconds for your post to appear in Google Blogsearch. Other domains get indexed anywhere from 30 minutes to several hours later, probably depending on where their feeds are published. (ON EDIT: This post was indexed in 1 minute or less, indicating a change in Google’s blog crawling since I last checked it.)


Live Web Search


Microsoft’s Live Search uses artificial intelligence to crawl, index, and rank pages. Artificial intelligence learns from linking patterns, page design techniques, and user performance (where do people click through and what do they do on the next pages). Many SEOs feel that Microsoft’s search results are the easiest to manipulate but that isn’t always the case. The more Live Search has learned about the pool of pages that satisfy a query, the more difficult it becomes to move content into that query space.

Like Ask and Google, Microsoft’s Live Search allows links to pass anchor text. Microsoft engineers have also suggested they evaluate something like PageRank, so link citation values are probably universally used among all the major search engines.

Microsoft counts your click-throughs by using the “gping=” link attribute, which overrides the normal URL in the “href=” attribute. While click-through tracking is generally a very bad method for determining the value of Web pages, this behavior appears to be so poorly documented in the SEO community that the usual click-manipulation techniques have been ignored by the majority of SEOs. As a result, “easy-to-grab” query space can metamorph into “why did my page lose rankings” without apparent explanation.
How do you manipulate AI-based rankings?: Get lots of links and persuade a lot of people to click through on your listings.
Do meta tags help?: Not generally. Even the description tag seems to be given only part consideration. On-page text is often mixed with meta text in listings.
Does Microsoft look at “trust”?: They don’t seem to be as concerned with trust as Ask and Google. The AI does a pretty good job of moving spam out of the search results. However, Microsoft’s relevance is sometimes as broad and unfocused as Yahoo!’s because they do a better job than Google of allowing queries to correspond to multiple semantic contexts.


Live Web Feeds


Live Feeds is still a beta service but it appears to have figured out how not to include forums. On the other hand, you’ll get news service feeds mixed in with blog feeds. Relevance does not appear to be influenced by titles. Click-throughs probably measure relevance more than anything after a page has been identified as “relevant” to a query through on-page factors.

Yahoo! Web Search


Yahoo! is the oldest of the search services and its Web search technology is built at least in part on the old Inktomi search technology. Inktomi was more influenced by raw link power than any other search engine and Yahoo! engineers claim that each successive link from a host (domain or sub-domain) should count for less than the first link.

Yahoo! tracks click-throughs, is influenced by page titles, keyword repetition, and other on-page factors. The search engine allows links to pass anchor text as well.

Many SEOs claim that on-page factors work better than links with Yahoo!. I have not always found that to be the case. However, Yahoo! may have — until recently — been allowing its directory listings to add some weight to Web search results. A few years ago Yahoo! satisfied queries with directory results listed first. A couple of years ago Yahoo! changed its search results to mingle directory listings in among Web search. More recently Yahoo! removed explicit references to its directory from search results pages.

However, Yahoo! still seems to be favoring its own directory listings over Web content in at least some searches. I suspect that query history trends strongly influence Yahoo! search results. If a query has a long life span and in the past results were served more often from the directory, those directory listings may still be favored today.

Large content sites also often dominate Yahoo! results, implying that internal linkage may indeed have better impact than external linkage.
Yahoo! no longer offers an RSS search: However, RSS feeds are often included in their general Web search results.
How do you get into the index?: Yahoo! is one of the more aggressive crawlers. They still accept free submissions from your Yahoo! account but those submissions have been de-emphasized. Verifying a host and submitting an XML sitemap or RSS feed may be the fastest way to get into Yahoo! (unless you get links from well-crawled sites).
Does Yahoo! use “trust”?: Yahoo! jointly developed the TrustRank algorithm with Stanford University. However, subsequent research revealed flaws with the algorithm. It’s most likely that Yahoo!, like Google, employs some “trust filters” and may also (like Ask) intentionally favor at least one trusted domain (the Yahoo! directory).
How does Yahoo! determine relevance?: Yahoo! looks at a broader semantic context than Google but lacks the categorization that Ask and Live offer. Repetition of keywords, both on page and off-page, can help. The title tag is very important. Link anchor text helps some but not as much as with Ask, Google, and Live.


Final Thoughts


There is much I haven’t shared here. It’s too easy to get wrapped up in the details, and most SEOs are so convinced that everything is based on links that explaining the fundamentals to them is generally a waste of time. You need two things to be indexed in today’s major search engines: links and content. Without one or the other you’re pretty much dead in the water. How much you need of either is not clear to anyone. Most SEOs have thrown their eggs into the “you mostly need links” basket. That makes them vulnerable to competition because, frankly, favoring either links or content too much is inefficient.

How do you know when you need more links or content? Use the SEO Method: Experiment, evaluate, adjust.

{ 0 comments… add one now }