What might an object-oriented search engine look like?
Posted by admin on December 10, 2006 in SEO Theory
Back when I was helping traditional programmers evaluate the strengths and weaknesses of object-oriented programming, I used to describe objects as “bounded data spaces”. That is, you establish a set of rules that define your data and what you can do with it. An object is a piece of data that is associated with functions and procedures that allow some external source to manipulate the data. Object orientation makes programming more reliable because it employs encapsulation. You cannot poke unnecessary or unexpected command structures into the data set because its boundaries are protected.
Webmasters who work with the Document Object Model (most often utilized in Javascript and CSS) work with object-oriented programming tools all the time. Object-oriented search is, conceptually, just a step beyond the document object model. That is because, in theory, we could add a search function to the DOM (to be executed by client software that recognizes the DOM) that responds to queries about the document. Let’s call this the extensible Document Object Model. The eDOM’s function is to encompass multiple venue needs without destroying the integrity of the traditional DOM (which is one of the strengths of object-oriented design — big objects can encapsulate small objects).
Web servers already handle some of the query functions for documents (such as the HEAD request that is used most often). Search engines, naturally, handle many other queries for documents. But search engines are clunky when it comes to finding out information about a document. Google has achieved the most in this area with its increasingly useful info: query operator. But there is so much more we could want to do with that concept.
Object-oriented search does not seem to be efficient for distributed query-processing. That is, if we could append an executable element to every HTML document (static or virtually served) that searches the proposed extensible document object model, we could run a query like “who is mickey mouse” and go raise a family while every document on the Web searched itself and some central engine tred to make sense of all the replies.
To make an extensible Document Object Model work efficiently for search, you have to remove the eDOM from the Web and build it inside the search engine. Call these copies the local extensible Document Object Model (or leDOM). What the search engine therefore works with is a collection of agents or proxies that represent the documents that have been indexed. These proxies know everything there is to know about their respective documents.
Now, even though we’ve just reduced the complexity of our search process by an order of magnitude, it’s still too inefficient. In reality, an “object” is a little computer program that has to loaded into a computer’s CPU and executed. Large programs spin off objects all the time, and they execute and then vanish. Some objects are persistent, saving themselves at the end of one use and restoring themselves at the start of another use. They know exactly where they left off and pick up at that point.
But persistence is expensive. It’s actually easier to implement a data manager that virtually acts like a separate object for each piece of data. However, object managers don’t ensure encapsulation. Without encapsulation you may lose integrity as the boundaries of your virtual objects are occasionally prone to violation (either inadvertent or deliberate).
It’s better to group objects together in object cluster communities. Think of an object cluster as a group of animals sleeping on the open plains. One animal is left to watch while the herd rests. Suddenly, a predator appears on the horizon and the watch-warden rouses the herd into movement.
In an object cluster, any active object can revive the other members of its cluster. So now we can create object clusters around words, such that when someone searches for a word like “michael” the “michael” cluster of leDOM objects is awakened to look for members who are relevant to the query. In this way, computer resources are used more efficiently than if you tried to keep all the leDOM objects active or started them all up from persistent sleep every time a query is processed.
Some clusters would probably never go to sleep. And that would be okay, because you can include more than one type of object in a cluster. Each cluster can include both leDOM objects and query objects that handle the incoming queries for the cluster. The query objects can be the watch-wardens for the cluster, but they can also maintain a cache of recently processed queries, checking with each other to see if they have already answered the latest query. Instead of constantly waking the eDOM objects to search themselves, the query objects will actually handle most of the searching.
The chief problem is that, with billions of documents on the Web, you end up with millions of word object clusters. That’s highly inefficient. What might work better is a concept cluster, where the cluster is organized by concept objects that know how to transform their associated concepts into the proper words.
“Michael Martinez is an ignorant fool” is a concept. There may be many documents out there which express this very worth concept, but they may do so in more than one language and with more than one idiom (expression). There is more than one way to insult Michael Martinez. So the search engine will work more efficiently if it can transform the concept into a tighter expression, such as “michael martinez insult”. Now we have a reducible expression that can be divided into two concepts “michael martinez” and “insult” or three words. Either way, the search engine need only active 2 or 3 concept clusters. And the clusters can be intelligent enough to be told “associate ‘michael martinez’ with ‘insult’”.
As far-fetched as this might seem, the search engine would still be faced with the daunting task of figuring out which documents are most relevant to the query. While there may be a million pages out there were someone insults Michael Martinez, the search engine has to make some choice. The object-oriented model may improve efficiencies in processing but it doesn’t really help much with relevance.
Still, if we know that our documents are going to be indexed as objects, we would want to structure those documents that are most insulting to Michael Martinez so as to be returned as most relevant for queries that reduce to “michael martinez insult” (which, in reality, is too simplistic because it could also refer to insults from Michael Martinez). An object-oriented SEO would have to push the right buttons to ensure that the document itself “knows” that it is relevant to an attack on the character of Michael Martinez.
That is, a passing insult that knocks my ancestry while discussing the mating habits of ducks should not be as relevant as, say, an entire 5,000 word treatise on how I have shattered the harmony of the Internet. A well-written poison pen article should engage in the most sophisticated character assassination, making it absolutely clear that I am the most vile, ignorant human being ever to touch a keyboard.
In essence, the document object has to scream out, “I INSULT MICHAEL MARTINEZ”. And it can do that pretty easily by setting insults in its title element, meta description, meta keywords, a few Hx headers, some bolded text, and some italicized text.
Just for good measure, the document can link out to other documents that insult me and question the loyalty of my ancestors to both sides in the American Civil War.
Finally, the document should be pointed to by other documents that insult me with praise-filled anchor text such as, “This site really knocks that hard-headed Martinez into last year!”
Does that look familiar? Except for the last step, it pretty much resembles what we do for on-page optimization now. A document speaks about itself in the most primitive way, and we don’t need sophisticated object-oriented programming techniques to understand that.
But where an object-oriented, conceptual search engine might turn the tables on SEOs is to look at inbound link anchor text at the conceptual level — looking for honest approval and endorsement, and not simply keyword association. Technically, there is no reason for why search engines cannot do that now. Rather than just blindly assume that any anchor text represents an editorial endorsement, the search engines could begin filtering inbound anchor text by looking for structure.
I don’t mean they need to look for praise or disapproval. Rather, natural expressions used in anchor text tend to have structure. Now, maybe there are plenty of acceptable uses for unstructured anchor text. But if it were me, and I had to ask millions of clusters to figure out which document was best at insulting me, I’d want to know the anchor text thaht was most valuable was the anchor text that had some structure. That generally implies that someone has put some thought into what they are doing.
And that’s the point of all this: good link anchor text looks more thoughtful than a whole essay on object-oriented search engines.
1 Comment on What might an object-oriented search engine look like?
By guy on December 12, 2006 at 12:11 pm
Your in-depth post has many angles but is not considering one simple fact: users. Myself and many others will link to a home page as ‘home’ because of the simple fact that it offers the clearest indication of where the main content resides. And if over time that page happens to accumulate enough value it might respond rapidly to diverse external anchor, as oppose to confering that value from within. Why spoil users’ experience to favor a better reading to search engines? This meaning can be passed through other ways.
What was mentioned at SES Chicago about PPP is yet another proof of how furious the pace of monetization of the web is by which we will soon find ourselves with nothing we can really trust, as everything will become a one stealth paragraph direct-marketing advertorial.
On another note, I tried posting at your forum but wasn’t allowed. I wanted to know your present thoughts on an old thread about 3 possible indexes. Now that we also have a large supplemental one your views may have change?
Comment
Log in or Register to post a comment.