Yahoo! Site Explorer - A review of Yahoo! Site Explorer

by Michael Martinez on October 6, 2008

Now that Yahoo! has updated its Site Explorer service we should all be looking to see what we can learn from Yahoo! about our search engine optimization. I’ve been beating up on the SEO community for years for relying on Yahoo!’s backlink reports and I haven’t changed my position on that practice. You cannot learn anything about what Google knows from Yahoo!.

But you can certainly learn more about what Yahoo! knows now than before they updated the Site Explorer service. In fact, on first glance, the new Site Explorer seems to be more honest and forthcoming than Google Webmaster Tools. If you cannot trust an analytic tool, you have no reason to use it. And, trust me, none of us has any reason to use Google Webmaster Tools for link analysis.

Yahoo!, on the other hand, has introduced a brief statistics report. It’s better than nothing and does provide some information I wish Google provided. Specifically, Yahoo! tallies up some numbers for us. Here is the data Yahoo! provided for one domain I’ve authenticated with them:

Yahoo! Statistics For Sample Domain
Indexed Pages: 4,665
Crawled Pages: 2,081
Known Pages: 4,665
Known Hosts on This Site: 12
Hosts linking to Site: 1,844
Domains linking to Site: 1,601
Hosts Outlinked from Site: 986
Domains linked from Site: 893

There are some odd numbers in this report, but they may be explainable. For example, where did the 12 hosts come from? The domain does not have that many hosts (a host would be either a sub-domain or the primary domain). If I allow for a lack of canonical authority (other sites lacking sub-domains show 2 hosts) and a couple of sub-domains that have only been created (they have no content and nothing links to them), I can justify 12 but how would Yahoo! know about two undeveloped sub-domains? Optimizing minds want to know.

The Crawled Pages (2,081) number looks about right but the Known/Indexed Pages (4,665) are just bizarre. Even allowing for many thousands of pages that once existed on that domain, none of them have been there for many years. Could there possibly be THAT MANY inbound links pointing to 404 pages? Like, WOW.

Yahoo!, on the other hand, has historically made up page URLs and fetched them to test the way domains handle their 404 error handling. There was a time when I conveniently redirected all 404 traffic on my domains to either the root URLs or the site maps, sending a Code 200 OK. Browsers don’t have a problem with that behavior. Search engines do.

To keep Yahoo! from indexing an endless number of bogus duplicate pages, I switched my 404 handling on all domains to serve up a custom 404 document that sends the right code. Perhaps all those Known/Indexed pages that don’t exist came from Yahoo!’s unauthorized URL generation. I don’t know, but that was never a nice thing to do to unsuspecting Webmasters. I don’t know if Yahoo! is any better behaved now than in the past because I just tell everyone to handle 404 traffic according to RFC specifications (or to implement 301 redirects, but that gets tedious).

There is no way to verify the number of hosts/domains Yahoo! claims are linking to the domain, but I like the tallies because they provide an opportunity for snapshot analysis. Even if there is a wide margin of error, knowing that Domain A has an estimated 1800 hosts linking to it and Domain B has an estimated 300 hosts linking to it gives me an idea of how much Web Visibility the two domains have with respect to each other.

Web Visibility helps with search visibility because of the crawling and anchor text and PageRank-like valuations that search engines confer, but you want to get as much traffic from referring non-search sites as possible. A domain with strong Web Visibility has more marketing opportunities than a domain with weak Web Visibility.

You can authenticate a sub-domain to get more detailed data and to upload XML/RSS Sitemaps (feeds). The verbose formats for the Pages and Inlinks reports provide a little more data than we used to get. The Verbose Pages report will tell you what language a page is in and when it was last crawled. The Verbose Inlinks report will only tell you what language the page is in.

While Google is more forthcoming about queries your site ranks for, inbound link anchor text pointing to your site, and crawl management, Yahoo!’s reports are more concise and better organized, although it would be nice if we could export all the data from the Yahoo! reports to a file.

Yahoo! search optimization has fallen out of favor. As more people focus on Google, fewer people develop their Yahoo! potential to its fullest. Many SEO forums are filled with people who claim they get little to no traffic from Yahoo!. Yahoo! is still the third most visited search engine in the industry, servicing nearly 60,000,000 people each month. That’s a lot of traffic and no one should be turning their back on it.

My sites do okay by Yahoo! but they could do better. Once in a while I come back and tweak things a bit. Now that Site Explorer is providing some decent information, I may spend a little more time building up my personal Yahoo! traffic.

{ 6 comments… read them below or add one }

devdotcom 10.06.08 at 7:54 pm

SeoMoz’s Linkscape looks like the future for link analysis. No more Yahoo Site Explorer!

Michael Martinez 10.07.08 at 8:00 am

It won’t be any better than Yahoo! or Google, so don’t get too excited about it.

I’ve got a post coming up next week that explains in some detail why you cannot use Service A to analyze factors in Service B.

devdotcom 10.07.08 at 8:49 am

Cool Michael, that should be interesting. You always have a fresh perspective on SEO.

Joshua Sciarrino 10.07.08 at 10:29 pm

Michael, I think your too rash to judge Linkscape.

I don’t think it’s using Yahoo’s API.

It’s a spider, S-P-I-D-E-R. Just like Googlebot and Slurp.

Moz has hired like 2 (at least one, but I think it’s two) former Microsoft folks. They got 4-5 developers and basically rounded this up from scratch, they filed for a patent and everything.

So, don’t judge a book, before you even looked at the cover. ;)

Michael Martinez 10.08.08 at 12:49 am

I didn’t say Linkscape was using Yahoo!. I merely said it won’t provide any better information than Yahoo! or Google. It doesn’t matter how many people Rand has hired. All he can do is crawl the Web and build his own index.

He cannot duplicate anyone else’s index.

He cannot determine which documents pass value in anyone else’s index.

You cannot use Linkscape to analyze Google, Yahoo!, Live, or any other search engine’s results.

Traffics Pain 10.10.08 at 10:30 am

“He cannot determine which documents pass value in anyone else’s index.”

Have to agree with you on that one.

It seems seomoz are basically building a paid search engine of thier own and this cannot be related in any real way to the results that Google/yahoo etc will decide to show.

In fact more likely that eventually all users and manipulators of that particularly ‘index’ could be dropped from google altogther if they felt like it right?

That ‘would’ keep all the ’super marketed’ seo experts to thier own paid search engine and leave the natural serps for the 99% of people that dont use these type of things for one reason or another. That could sway things like reported search terms to a much more natural state of affairs too which would help these 99% in our own marketing efforts.