Cool facts about Cuil

by Michael Martinez on August 1, 2008

So the good folks at Cuil have created a Cuil Announcements page, which should help us stay on top of things at Cuil provided they update the announcements page more often than Microsoft, Ask, and Yahoo! update their blogs.

The news media have already reported that Cuil received more than 50,000,000 queries in their first 24 hours, and they suggest that many of the problems we saw were probably due to overload. Apparently, some of their equipment broke down. That the service was able to provide results under that kind of strain is a positive sign, in my opinion.

I noticed something else: Cuil honors the crawl_delay directive. And they actually recommend a specific SEO site’s tool (Ian McAnerin’s robots.txt generator).

They are accepting URL submissions for the present time.

Cuil also seems to be taking a swipe at PageRank and its impersonators:

9. The extra pages you index are really useful. Why don’t other search engines index them?
Some people argue that many pages on the Internet are spam or porn. It’s true that during our Web crawl we have found and filtered those kinds of pages, but we’ve discovered that the number of them is quite small. It’s just that the makers of those pages use techniques to push them forward. We’ve also found quite a number of duplicate pages that we didn’t include in our index. So far, we have crawled 186 billion pages and have included 120 billion in our index. We continually index more pages.

We’ve found that a lot of Web pages have been designed with a small audience in mind—perhaps they are blogs or academic papers with specific interests or pages with family photos. We think that even though these pages aren’t necessarily for a wide audience, they contain content that one day you might need.

Our job is to index all these pages and examine their content for relevancy to your search. If they contain information you need, then they should be available to you.

The incongruity between images and the articles they are matched with annoys me, but it may be that Cuil is actually a few steps ahead of the Web community:

7. You have images beside your results. How do you pick which pictures to use?
We know from our research that people can make better and quicker decisions about relevance and quality when they can see an image from the website. We do our best to take images from Web pages that accurately reflect the content of the website. Many websites are full of images, so we use advanced algorithms to determine the best image to show the user.

If you search on my name, for example, you’ll see the Xenite.Org badge image positioned next to SEO Theory. As best I can recall, that image is not linked to anywhere on this site. The identification is appropriate, at least in that I do own Xenite.Org and I am the primary content provider for both sites. But it underscores just how barebones SEO Theory really is. You won’t find a picture of me on this site.

We’re supposed to redesign it soon as we move some content around the corporate network. I’ll see if I can get a picture included on the blog at that time, and then it will be interesting to watch what Cuil does.

On the other hand, images positioned next to other sites I own in the Cuil results make no sense to me. I don’t recognize them, and both sites do have pictures of me. Some of the pictures look like they should be placed with other sites appearing in the search results (well, they are pictures of other guys I am guessing are named Michael Martinez).

We might be able to alleviate some of this confusion by more properly embedding our personal images into sites we control. For example, I looked at Ian McAnerin’s search results and Cuil grabs the masthead image from McAnerin.com. It’s an appropriate association but I am pretty sure I have seen Ian’s picture in more than one place on the Web.

I performed several searches on Jill Whelan’s name, forgetting there is an American actress by that name. I finally found the SEO Whelan’s picture in the query for “jill whelan high rankings” — but the image is positioned beside a different Web site. Clicking on the site I do find a picture of Jill but not the picture Cuil shows me. In fact, I would say that the picture Cuil placed beside the listing is the better picture of Jill.

I searched on Rand Fishkin and Todd Friesen. Only got a hand erasing a chalk board for Rand and got a Webmaster Radio masthead image for Todd. Danny Sullivan’s picture appears next to a couple of listings for…Search Engine Watch (and one other site).

I searched on an actress about whom I have published content (including an image gallery). The major entertainment site that features her show appears first in the search results but they get a bland logoistic image. My star picture appears beside my page’s listing.

I think we can conclude that sites which get better picture placement probably use the person’s name in file names, ALT= text, and surround the images with the person’s name in clearly indexable text. At least, based on the few searches I’ve made that seems to be the pattern.

But why Cuil would substitute pictures from other sites is a mystery to me. I’ll have to experiment with some image placements over the next few months and see what happens.

For the record, I like Cuil’s interface very, very much. It’s much more useful and easier to read than Google’s SERPs (and Microsoft’s, and Ask’s, and Yahoo!’s, etc.). These SERP pages are more informative, too.

However, I’m not sure what I can do to influence those text selections. They don’t appear to be using the meta description tag (which seems to be in keeping with their philosophy of focusing on on-page content). Showing my copyright notices, for example, implies to me that I may have to create Cuil-copy, informative text that has a chance to appear in that snippet.

Cuil-copy should work just like a meta description, but I think we have an opportunity to experiment with pairing pictures with the Cuil-copy. Maybe it’s time to develop a microformat that acts like a calling card for a Web page.

Call it a page metacard microformat for now, until someone comes up with something better.

{ 2 comments… read them below or add one }

DangerMouse 08.01.08 at 1:52 pm

I’d certainly favour using some form of microformat over guesswork - as ultimately, regardless of how good the algo is, displaying images is guess work. I’ve noticed several cases where images from a corporate site i monitor quite carefully are shown next to competitor results! Despite the fact that those images do not appear anywhere on the competitors site. This will bring a whole new wave of copyright infringement claims.

DM

Michael Martinez 08.01.08 at 2:13 pm

I have been looking at the hCard Microformat, which can now include graphics, but the problem is that hCards are intended to identify people, not associate people with Web sites.

Maybe we can devise a wCard Microformat that provides identifying information about who owns or contributes to a Web site. It would be nice if there was a browser plug-in that allowed you to inject your wCard into, say, a comment box on a blog or a forum. Of course, we might need blog and forum plug-ins to make that work. I don’t know.