So unless you’ve been hiding under a rock, you may have heard that Ask.com just said it’s not for sale and it has a new search technology. That’s a radical statement from a company that arguably has the best relevance algorithm on the Web already.
According to Ask CEO Doug Leeds, Ask has been developing a new search algorithm for a year that focuses on answering questions by evaluating content rather than links. He wants to provide relevant search results rather than popular search results. Let me leave it there for a brief digression.
Imagine using a ubiquitous search tool (call it Search Brand X) where when you type in a query it only returns results that are in a constant state of flux. “What time is it?” demands that kind of information. “How old is Barack Obama” demands that kind of information (but at a much slower pace). But what about “is marijuana safe?”
Think about that last query carefully. Medical science tells us there are many health risks (including increased risks of cancer, lung disease, and dementia) associated with marijuana. It also tells us that marijuana provides some benefits (including reduced nausea for chemotherapy patients).
Marijuana advocates have written many Web documents telling you how safe it is, how beneficial it is. They have obscured the truth with propaganda. But then, so have any many anti-drug messages obscured the truth with propaganda.
Our ubiquitous search engine, if it only looks at factors such as timeliness and popularity, cannot return the most relevant information for a query such as “is marijuana safe?” Why? Because the last horse out of the gate has the freshness factor in its favor and the most popular horse in the race has strong backing from many sources (whose trustworhiness and authority-in-knowledge cannot be ascertained).
In other words, our ubiquitous search engine is easily swayed by propaganda. But ubiquitous search suffers from other vulnerabilities, too.
For example, let’s say that Search Brand X has invested millions of dollars in filtering out Web documents that are clearly not useful to anyone. Among its filters it includes a “trust factor”, and this trust factor is based on analyzing what people actually say about Websites. I’m not talking about links — I’m talking about anecdotal evaluations.
And let’s postulate there is a very large, powerful Web site out there called Ubiquitous Knowledge About Stuff. UKAS (for short) relies extensively on user input for its information, although it does maintain a staff of trained editors who know how to evaluate basic grammar, article structure, intellectual property rights, and who occasionally make an effort to verify the user-provided content against more august and conservative repositories of information.
UKAS might provide information about movies, books, celebrities, politicians, supporters of political parties, authors, history — there are many such sites out there already, all maintained through CMS + Database models that rely on user-generated content. Some of those sites have been around for 10 years or longer.
And people say good things about those sites. So UKAS is not a euphemism for any one site and it’s not an unrealistic example. UKAS exists today under many names. There’s just one problem with UKAS: you don’t know if it’s going to say the same thing tomorrow that it says today.
As such, UKAS is neither authoritative (in that its “knowledge” is unstable and as likely to favor accuracy as not) nor reliable (in that its information may be clouded by propaganda or ignorance). UKAS has taken measures to ensure that its content meets its editorial guidelines. Regrettably, those editorial guidelines neither require nor encourage completeness, accuracy, or logical correctness.
Our ubiquitous search engine, however, sees all the content provided by UKAS as being fresh (because it’s updated every day), popular (because many sites link to it), and well-regarded (because people say good things about it and because Brand X Search staff who have reviewed UKAS think it does a “pretty good job” of introducing people to its topics).
The ubiquitous search engine makes no provision for determining completeness, accuracy, logical correctness, or honesty in the content it serves to users. Search engineers argue that software really cannot do that for people, and that people have to be the best judge of “good quality”.
But here’s the rub: How do people who don’t know what they are talking about establish “good quality” through their links, comments, and contributions? Put a blindfold on and judge an archery contest. You’re as likely to pick an amateur who has never pulled a bow before as to identify the reigning world champion as the person who just fired the last shot.
None of us can possibly be expert enough in everything to know if anything we find on the Web is useful, accurate, and reliable information. We are easily swayed by rumor, myths, and lies — propaganda packaged, repackaged, and thrown into our faces with advertising or references to more propaganda embedded in the content and margins.
It doesn’t matter if the propaganda is useful, helpful, generally complete, accurate, or logically correct — it’s still propaganda. It’s not generally reliable information. And it is unfortunately the nature of human experience that we share more opinions than incontrovertible facts. In other words, we cannot blame the ubiquitous search engine for serving us crap results because we are, in fact, the people who made those crap results to begin with.
So the question we as searchers are faced with is: do we really want to look at the low-quality but popular and fresh content the ubiquitous search engine is serving us? There is no one answer. Sometimes the crap is good enough because we just want to get a quick reminder of some thing, or a quick intro into some topic, or we want to kill some time. And then there are the times when we don’t want to see the crap so we change our queries, dig deeper into the search results, or pick another ubiquitous search engine.
In other words, we live and wallow in the crap we create because it’s convenient for us to just sort of push it aside, spend more time in ubiquitous search, slightly elevating our time spent on search when we want something just a little bit better than the crap we created and promoted to the top of the ubiquitous search results.
So now a search engine comes along and says, “We will provide you with better answers if you just ask the right questions.” And they probably can provide better answers to those questions than a ubiquitous search engine that has grown accustomed to serving up crap but here’s the rub: do we know how to ask the right questions and do we care to learn how to ask those questions?
We are not a search culture that is used to asking good questions. We are not a Web culture that is used to providing good answers. There is an incongruity between the Web we created, the search we have tolerated, and the alternative being offered to us. Still, if you look around and realize you’re wallowing in muck, when someone comes along and offers you something better, would it make sense to take that offer?
On the surface it seems like a no-brainer, but regardless of whether you’re talking about Ask or Cuil any time a search engine says it’s focusing on content and relevance rather than linkage, you have to ask how they crawl the Web without the links? Of course they are crawling, they say, but they are not going to rank on the basis of links.
Popularity, trust, and value can all be inflated. Just get more links from Websites that don’t do anything untrustworthy. Over time you’ll cross threshold after threshold. The patient man can game any search engine without breaking a sweat. And there are many patient men (and women) in search because, frankly, most of them rely on links (rather than relevance) to meet their search visibility objectives.
There was a time when search engines trusted Web documents to be honest about what they were. That atmosphere of trust was abused on a horrific scale and I doubt anyone who lived through those search days wants to return to them. So how does an alternative search engine improve upon the results offered up by ubiquitous search?
The answer is not simple. But I think the answer has to include links because, frankly, no piece of software can know enough about all topics to be able to judge the quality of content in any topic. In other words, algorithmic relevance by itself is no better a tool for determining quality than link-based metrics.
But there may be a universal principle embedded in research directed at developing filters for Web spam. Various academic papers have been published which show that combining filtration methods generally produces better filtering than any one filtration method by itself. In other words, the more algorithms used to flag probable Web spam, the more reliable the Web spam filtering tends to become.
Let’s call that the Principle of Corroboration, which we can state thus: In any class of inexpert observers, the more agreement between conclusive arguments provided by multiple observers, the more likely the corroborated arguments are correct.
Why is that? Well, let me explain a few things.
First, we must assume that our inexpert observers are unbiased. That is, none of them will bring any propaganda to the discussion. Each observer notes the available information, draws an unbiased conclusion, and submits that conclusion for review.
Second, we must assume that our inexpert observers have appointed a higher authority — someone in the class is NOT observing or drawing conclusions. This decision-maker can be any member of the class but does not always have to be the same member.
Third, we must assume that our inexpert observers are each using different methods for reaching their conclusions. For example, three observers might evaluate a car on the basis of its weight, color, and number of doors (respectively). Each observer would only focus on its one criterion and submit its conclusion to the fourth member of the group.
This Principle of Corroboration is actually used in Fuzzy Logic Programming and several other areas. It also reflects some things that happen in swarm theory. In short, the Principle of Corroboration tells us that in any group of decision-makers, the aggregate decision is more likely to be the correct one than any individual decision.
How does that differ from relying on links to tell you what is relevant? Links can be gamed. So can on-page factors. So can testimonials, anecdotes, and hundreds of other factors. But this is why search engines usually look at hundreds of factors in their algorithms because the more factors you try to game the higher the cost (to you) of gaming the search algorithms becomes.
That means that if a ubiquitous search engine is already factoring in hundreds of different decision-makers (algorithms, filters, whatever), an alternative search engine has to take the process to a higher level to provide a more meaningful search experience. I don’t believe it would be productive to throw out the Principle of Corroboration in the quest for more relevant search results.
So how do you take search to the next level?
One possibility is to narrow or refine the definition of relevance. Relevance can be shaped by many different factors in Information Retrieval science, but one factor that can really improve relevance is the query itself.
That is, the nature of the query can limit the scope of relevance. Ubiquitous search has responded to a growing number of self-limiting queries such as “address for the white house”, “pizza delivery in houston”, and “michael martinez who wrote understanding middle-earth”.
But these are free-form queries. They don’t really help searchers by imposing any structure.
Some queries do impose structure, such as “425 * 12″ but that’s a very restrictive format. What Ask is proposing is that people are asking questions and not finding the answers to those questions. So in order to help users, perhaps Ask wants users to ask better questions. I don’t know (yet).
We do tend to ask very poor questions in search results. Some of the questions that bring people to SEO Theory include:
- crawling ecommerce best practice
- on page optimization checklist
- crawling off a website edge
- content rich doorways
- “chase the long tail of search”
These don’t look like questions, do they? But in fact anyone who has read the articles to which these queries are relevant should understand intuitively that these are indeed questions. People are looking for some pretty obscure stuff.
Anyone can throw these terms on a page, in its URL, and point anchor text at the page. Ubiquitous search will serve up such results to us all the time. But the real question is, if there is a set of really good “best” documents out there that answer the poorly asked question “onpage optimization checklist”, can any search engine really offer up that set of documents?
Again, no search engine has a group of experts to draw upon for all fields of knowledge. Hence, the search engine that wants to focus on relevance will have to use the Principle of Corroboration to substitute for actual knowledge. And one way a search engine like Ask can incorporate the Principle of Corroboration into its enhanced technology is to extend its model of experts and authorities.
Many years ago John Klein postulated that you could pick out some pretty high quality Web documents by looking at how many authorities (hubs) link to an expert, and how many experts (articles) link to authorities. The tighter your connections between experts and authorities, the more likely you had a good set of documents. Ask incorporated this model into its algorithm (or, technically, Teoma did and Ask bought Teoma for its search technology).
But how do you sort out the tightly-linked groups from link farms? You have to look at other factors.
And if Ask is doing this already, then you have to ask what Ask is doing now that is different from what it did a year ago.
To be honest, I don’t have the answer. But, frankly, I’m tired of finding myself mired in crap search results. If Ask can restore relevance to search, I say we should give them a chance to prove they know what they are doing. Ubiquitous search sickens me because I rarely find what I am looking for (unless I am checking rankings) on the first page of search results any more.
I think the amount of time people spend on search, and the number of page views per session, have increased because the quality of people’s search experience has declined. Search engines have improved search through the years to the point where search is almost unusable.
Sure, people click through to various results. I do that often myself. At some point you have to pick something and hope for the best. That’s no measure of success in search.
It just means people clicked through to various results.
Maybe Ask has found something that works. Maybe not. If they have, then that will put pressure on Google, Microsoft, and Yahoo! to improve their games. And in my opinion there is plenty of room for improvement. The way search serves up “relevant” content today is extremely crude and primitive compared to the way I want to find it tomorrow. What I want is to find the content I am looking for quickly and easily without having to know advance what it is.
And if that means I have to learn a whole new way of searching, well, I’m ready for it.
How about you?
{ 6 comments… read them below or add one }
kpmrs 11.20.09 at 10:03 am
wow. just spend about 10 minutes reading this article (being a slow reader) but I made sure I read it till the end.
such a new way to look at searching.. Thanks for the post Mike
Alan Torr 11.27.09 at 2:26 pm
Fantastic! I’m new to the whole SEO arena and clearly have alot to ‘unlearn’, what I’m saying is I’ve learnt from SEO wannabies who’d rather tell you what you want to hear, rather what you actually need to hear.
I’m looking forward to reading more! Thank you.
MarjoryM 12.02.09 at 10:54 am
This is very interesting. You seem to imply that if we phrased our queries more like natural language questions, we could have more relevant results. Wasn’t that the basis of Ask Jeeves, the forerunner of Ask.com? Or am I missing something?
Michael Martinez 12.03.09 at 11:44 pm
Actually, all I am suggesting is that people give Ask another chance and see if something shakes loose. They say they have improved the technology. My own observations are inconclusive. If enough people look, maybe someone will spot something.
crohole 12.09.09 at 12:51 am
Well.. I have search my website backlinks in google, yahoo, bing, ask, alexa, altavisat, and exalead. But each search engine has different result. Why it’s different?
I use link:www.domain.com
The 2nd prob is when I use http within domain the results is different to. Can you explain about all that problem. I mean keyword like this :
link:http://www.domain.com
Michael Martinez 12.09.09 at 11:16 pm
Each search engine does its own crawling, maintains its own database, and decides which links it will show you on the basis of undisclosed criteria. No search engine is going to report all of your backlinks but that’s okay because no search engine allows all your backlinks to pass value within its index.
I could only speculate about why any particular search engine would show you different results depending on whether you include the http portion of the URL. Sorry.
You must log in to post a comment.