Zoetrope makes the Web 3-dimensional

by Michael Martinez on November 19, 2008

ScienceDaily reports that University of Washington students have developed a search engine algorithm that will allow you to search the Web from a point in time. Once implemented, the Zoetrope algorithm will make every known state of a Web site available to searchers, unlike Internet Archive, which requires that you retrieve URLs based on dates and then scan the documents manually.

Zoetrope will go beyond mere search indexing, however, and combine an analytical capability that helps people look for trends and patterns. The prototype has been running for four months and presently archives 1,000 sites (or pages — it’s not clear to me which the article means) every hour. The application has not yet been scaled to commercial volumes, but maybe it will be combined with the Internet Archive’s database (Amazon, are you listening?) at some point.

What would a Zoetrope service mean for search engine optimization? It might, actually, help people provide better information to searchers overall. If you know a Web page has been lost because a domain expired, was hacked, or was sold and redesigned, you could still link to the historical copy. I sometimes do that now with archived images at Internet Archive. On the other hand, there are queries that people link to which change over time. Would it not be helpful if we could get Zoetrope to archive the query results so they can linked to permanently?

Zoetrope would certainly help people track Web site changes as well. Case studies could be reconstructed using detailed histories of Web document changes, particuarly if the Zoetrope algorithm could be configured to tell us which pages on a site were changed, how often, how much, and in what ways. Students of search engine optimization could then go back and look at example sites and see how their on-page factors and outbound links have been altered.

There would undoubtedly be some legal concerns, and I can see people opting out of the Zoetrope image archive as soon as it goes live. Some people might not want their ugly old pages to be searchable. At least with the Internet Archive people MUST specify dates and scan documents. With Zoetrope you might find content you never knew existed. Law enforcement and legal research teams might be able to track down hidden repositories of illicit information.

But where the real opportunity may lie for optimizers and indexers alike is in the promise of the new technology. What if Zoetrope were made available to all major search services (much like satellite map data is, for example)? Each search engine would have the option of mounting its own search and analytical tools in front of the Zoetrope database. Optimizers would have the option of favoring one front end over another.

If one of the major search services just gobbles up Zoetrope, we’ll be limited to the commercialized vision of a self-serving company. We’ll have to wait for someone to develop a competing technology. But if by some financial miracle the technology is only non-exclusively licensed, then people will have opportunities to develop Web search applications that remember the past.

Zoetrope’s creators envison a browser-based interface, perhaps combining Zoetrope data with browser history. Imagine your browser asking you if you want to revisit yesterday’s Web sites as they appear today or as they appeared yesterday (or as they appeared last year). Chronological search optimization will become more refined. Today we only think about embedding dates to show when pages were cached. We may want to embed special timeframe stamps that show when pages were created, changed, or deleted.

A more sophisticated Zoetrope application might look at sectional changes within Web documents. If you scan only the margin links and clock their rate of change, what might that tell you about a Web site’s linking activity? If text on a front page changes hourly, you may want to specify to the Zoetrope algorithm which hourly changes it is retrieving.

Today’s Web is 2-dimensional in the sense that we think of the content we create and the content we connect with. Zoetrope adds a third dimension, time, that ultimately will include chrono-links (links embedded in content that point to older content in an archive). Search engines will have to decide for themselves if they want Chronolinks to pass value. If they do pass value, should it be the same type of value as normal hypertext links pass.

Citation analysis is about to take on a whole new frontier. One of the chief criticisms directed at citation analysis is that it doesn’t take timeframes into consideration. You can write a paper today, collect 2500 citations in other papers over the next ten years, and then be proven wrong eleven years from now. Those 2500 citations won’t go away, they’ll just be as wrong as you are.

But what if we could weight the citations based on when they occurred? Mathematical models have been developed that show chrono-weighted citation analysis is more trustworthy than basic citation analysis. The same might be true of links. Suppose domains that were once trustworthy change their behavior. Their old links should still be trusted by link-analyzing search engines, but their new links should not. Some links are designed to be transient. For example, scrolling headlines don’t remain on pages very long. Those links should be counted only within the timeframe they exist (if they should be counted at all).

Not that I’m a big fan of link citation, but providing a chronological aspect to link analysis that is accessible to average searchers changes the rules and levels the playing field. People won’t be able to change the past but they will be able to look at it and figure out how better to present their message to the Web community. Do you suspect your competition is outlinking you? Using snapshots of their link profile over time will show you trends (that may or may not be useful — remember that using Database A to analyze Database B is unproductive).

You can certainly develop new ways of attributing value to links by learning which links stand the tests of time and which links are a waste of time. Being able to follow the growth of linking patterns will help you evaluate which Web sites really did stir up a buzz and which sites just built useless junk links.

Your Web copy may also be living in the past. Suppose you spend 2-3 weeks doing keyword research and develop tons of copy that emphasizes those keywords. Two years from now you might see a change in traffic growth but revisiting your keyword research doesn’t provide much insight. If you build a data model to evaluate how the keywords you targeted have been used in Web copy, you may see a shift to new keywords, as more savvy copywriters caught on to new search trends.

Adding a chronological channel to Web search should, in my opinion, provide us with opportunities to develop new technologies and data mining methodologies that offer better insight into Web marketing, as well as better access to relevant Web content.

I look forward to see what happens with Zoetrope. It will be a shame if this technology cannot be scaled.

{ 3 comments… read them below or add one }

mugile 11.19.08 at 11:26 am

Hi Michael,

FYI
There is something wrong with SEO Theory RSS feed.

Michael Martinez 11.19.08 at 12:54 pm

Yeah, I’ve suspected something was messing up but we haven’t had time to look at it very closely.

Thanks for the message.

smitt 11.21.08 at 8:57 am

This idea is not a new one but somehow copied :(
http://www.dl.kuis.kyoto-u.ac.jp/~adam/ht08.pdf