Being relevant in an SEO state of mind

by Michael Martinez on August 10, 2007

Relevance is determined by context, but the context of which I speak is not created through words on a Web document. We shape the context of relevance through our perspective, the way we see the Web, search engines, and users.

Three ingredients must be brought together to form relevance: a searcher’s query, a search engine’s IR scoring algorithm, and the data (Web documents) that the search engine uses to resolve the query. The mix is sufficiently relevant if the searcher finds what he is seeking.

The mix is not relevant if the search results don’t satisfy the searcher’s need. But failure or success in the search results is not the only measure of relevance. For example, we can say that a query is irrelevant if it is given to the wrong search engine (or directory). A search service may restrict its content to specific types of Web documents, such as Web sites for an industry or hobby. Queries that have nothing to do with the search index’s topicality are irrelevant.

We can say that a search service is irrelevant for the same reason that a query is irrelevant: the wrong query is being used on the wrong search engine. So relevance can be measured in both directions. Given that Google and Yahoo!, the two largest search engines, each have different snapshots of the Web, it follows that many queries which produce more satisfying results on either search service must produce less satisfying results on the other.

In this context, a searcher may have failed to take market relevance into consideration, where both the query and the data must be relevant to the market. The market consists of the searcher and the information providers the searcher is looking for. Few search optimizers take market relevance into consideration, but in my opinion market relevance is not critical to the average search engine optimization campaign.

If search engines could provide better segmentation of the Internet — both among searchers and information providers — marketers would be in a better position to shape market relevant strategies but today we work with very crude concepts of segmentation (and, to be honest, universal search makes the process even more crude).

Google and Yahoo! are actually addressing some aspects of market relevance through their most recent changes in pay-per-click advertising rules. I think we’ll continue to see gradual improvement in market relevance theory through the next few years. Perhaps we’ll even see a major breakthrough that streamlines the technology and opens up new possibilities.

We can also look at relevance between queries and documents. How relevant is a query to our Web content, and how relevant is our Web content to queries? Unlike the relationship between queries and search services, we don’t have exact reciprocation between queries and content. That is because content tends to be relevant to many queries, whereas most queries are not relevant to much content. So instead we are dealing with an inverse relationship between the relevance of queries to documents and the relevance of documents to queries.

You could say that a query for “a an the or” is probably relevant to billions of documents, but what is the purpose of the query? What is the searcher looking for? There must be a purpose to the query, even if it’s only to demonstrate a very broad and ambiguous principle. How many documents would satisfy such a query? I’ll venture to say no more than 100 per search engine. Do you know which documents I refer to?

Search results pages are themselves content and they are themselves relevant to queries, but remember that searcher needs must be satisfied in order for search results pages to be highly relevant to the queries. The people most likely to be satisfied by search results pages are Web marketers — particularly search optimizers. Why is that?

If the average searcher is not interested in the search results themselves, then the Web documents listed in the search results must be gauged in terms of relevance. Search engines struggle with understanding relevance. For example, is a query for “britney spears news” relevant to news sites with stories about Britney Spears or is it relevant to the latest news concerning Britney Spears? Some specialty news sites don’t update their content very often. Is month-old or year-old Britney Spears news relevant to the query “britney spears news”?

The ambiguity is not due simply to freshness, a factor that search engineers have explored in numerous ways. The ambiguity is also due to the inadequacy of search technology. Today’s search tools do not ask us our intentions. They offer some crude guesswork at possible differences in intention. Ask is particularly good at this sort of guesswork. Northern Light used to be pretty good at it too. But why is their guesswork often unsatisfactory?

A search tool can only make guesses on the basis of aggregate data. The more data the tool has to work with the more efficient its guessing algorithm becomes. A search tool may look at click-through patterns, query patterns, and the available Web documents indexed in its database when making guesses (and I would guess that some of the guesswork occurs “offline”).

So if the search tool cannot be sure of the searcher’s intentions, it may make some very bad guesses about which documents are most likely to satisfy the searcher’s needs. Some search tool operators bring in human intervention to make adjustments, but the intervention cannot be applied to many queries as there are literally billions of queries every month. It’s humanly impossible to adjust every query’s results by hand.

So search tool designers will have to depend on trends in query processing, including both the queries people use and the documents that are provided in response to those queries. At an intuitive level it makes a certain amount of sense to look at click-through data, except that many people like me open links in new browser windows just in case the destination page hangs the browser; and many people also quickly click on the BACK button; and many people also quiickly close the browser window.

It is impossible to capture statistically valid measures of searcher satisfaction by monitoring click-throughs and BACK button activity. The ease with which companies can manipulate click behavior (today’s click manipulation networks can easily scatter their activity across thousands of IP addresses in dozens of network operating centers) virtually ensures that any search engine that relies on click-through analysis for its general Web results is vulnerable to extreme manipulation.

The likelihood of a very good match between search tool selection and searcher query is thus very limited, and that is why search engine optimization works. Natural content (unoptimized content) may not earn high IR relevance scores. It could, though. For example, if you operate a blog or forum where one huge page records hundreds of comments from people using the same expression, that page will indeed earn a very high IR relevance score (and such pages usually outrank link-bombed content in competitive queries, but they are relatively rare).

Search engine optimizers strive to pack keywords tightly into Web documents in order to boost those IR relevance scores (which are always computed at query resolution time). But for the past 7-8 years the SEO community has tended to favor the design of fairly short Web documents. We tend to do this for two reasons: first, it’s easier to manage a short document; secondly, there is a widely accepted SEO myth that short documents are favored in search results.

I, myself, have speculated on occasion that short documents tend to get favorable treatment in search results but every time I have shared such thoughts I’ve soon after come across queries where large documents zoomed past all the link-spammed “optimized” sites. What’s up with that? I would say the answer is that natural repetition just scores very well because most AdSense junkies now reduce their pages to the barest minimum content in order to place more ad units on the Web.

A single document can be relevant to 100 expressions and rank highly for those 100 expressions, so why is it necessary to divide that content into 100 pages? If usability and the searcher’s quality of experience is not a factor, there is no reason to spread content across multiple pages. But if you want people to be satisfied with your content you have to make sure they see it and most people won’t scroll down to the bottom of a huge page.

And this is a problem for Web marketers because they feel compelled to create thin content pages just to cope with visitor laziness. The plethora of thin content pages makes it nearly impossible to compete with natural repetition, and hence people resort to link spam in order to compensate for their lack of on-page optimization.

Which brings us back to relevance. A query may not be relevant to a Web document but it may be relevant to a Web site. And this is where the myth of theming has tripped many SEOs through the years. Brett Tabke popularized the idea of Web site theming in 1999 after following some forum discussions on related topics. He and I exchanged some ideas about why a site with 5 pages of content about The Lord of the Rings might rank better in Altavista than a site with only 1 page of LoTR content.

Altavista seemed to take the position that a Web site with only 1 page of content about a topic (that is, about a group of statistically related words) was more likely to be Web spam than a site with many pages of content related to those words. And Altavista actually had a very good search engine up until its last year. They overcame nearly every spam challenge they faced, and even introduced the dreaded CAPTCHA challenge to page submission (an innovation they announced at a Web marketing conference and for which they reportedly received a standing ovation).

But then a little search engine called Google introduced something called PageRank and all the rules of relevance that had been forming around conceptually-related pages went out the wnidow. Google set back search technology by about 5 years. PageRank can only be applied to pages, and hence theming became irrelevant to manipulating Google’s search results.

SEOs have been barking up the wrong theoretical tree ever since and even Brett Tabke has long since disavowed theming as any sort of useful approach. Of course, some people will point to the Hilltop and LocalRank algorithms and insist that theming is important because of those algorithms — to which I can only say, yeah, right. There is no place for Web site theming in search engine optimization today.

Not yet, at any rate. But SEOs do tend to get caught up in the relevant link myth, which assumes that links from pages about The Lord of the Rings help other pages about The Lord of the Rings more than links from pages about, say, books and movies. Of course, some SEOs would say that The Lord of the Rings and “The Lord of the Rings” are a book and movie, so the link would still be relevant (all the while conceding out of the corners of their mouths that the relevant linking theory dictates the second link cannot be as strong as the first link).

Still, who would turn down a link from the front page of CNN if they could get a “Buy Viagra Now” link placed there when CNN was not carrying any stories about Viagra? I see no hands.

Relevant link principles dissipate quickly when mixed with drool over linkage from high Toolbar PR sites.

So relevance cannot be measured by PageRank and it cannot be measured by links, but search engines try to do that anyway. Ask does a better job than Google with its ExpertRank technology (although they introduced click-through analysis in some timid fashion because they bought the old DirectHit technology years ago). The problem with relevant linkage, however, is that it is not very relevant.

Think about it. All I have to do to capture the top position for “hot dogs” is create 100 Web sites about hot dogs and have them all link to one very special page — that may or may not be about “hot dogs”. Why would I do this? Because my hot dog selling client wants to call his product “wieners”, or “frankfurters”, or “sausage dogs”, or “Chicago’s best gourmet food” — anything but “hot dogs”. That’s not very helpful with the search engine optimization, is it?

So link spam has become a necessary evil but only because the search engines allow pages to pass link anchor text, even though the link anchor text is technically irrelevant to the destination pages. In more random situations than not, link anchor text is not relevant to the content on destination pages (but by definition link anchor text is always relevant to the destination page itself).

The page is distinct from the content.

And because the page is distinct from the content the search optimizer’s task is more complicated. We have to study query patterns, we have to study competitive page content, and many optimizers try to study competitive link footprints (a truly useless passtime, but that is a topic for another post). We have to understand the market’s needs, and then we have to identify the right market and optimize for that market.

Some people might only be interested in “kosher hot dogs” but we’ll probably capture some “hot dogs” traffic anyway just because the two expressions overlap. Capturing traffic from the wrong market means your conversion rates go down. ROI may be adversely impacted by capturing the wrong traffic.

The Web content may be too ambitious with its relevance goals. We may be asked to match content with more queries than the content is really strongly relevant for. What is a poor search optimizer to do then? Suggesting the creation of new content is often greeted with a blank expression oir the deal-shattering words of, “But we just created the Web site — it’s ready to go!”

You can only do so much with inflexible and limited content, but links won’t always be there when you need them. In fact, in some queries, links won’t be relevant at all. Why is that?

Think about it.

{ 3 comments… read them below or add one }

dodito 09.23.07 at 12:15 am

Michael,

How about trust and themes (not “trustrank”). For example: take fashion.

If a website deals with say “fashion” and receives a lot of links referring to in the context of fashion (in anchor or outside). These could be fashion sites, but also “anti-fur” websites, or a website that deals with human rights and child labor etc etc. So.. relevance can be seen from many perspectives. (And frankly I do not see http://www.fashion.com being linked so quickly from http://www.instyle.com or http://www.dolceandgabana.com.. so where WOULD you be able to get topical links anyway..)

Is it possible for that website to obtain a certain “status” as ..”this is the website to go to for fashion so any new page on fashion will be in high regards by us: Yahoo or Google” ? Or would any new fashion page still be judged in the same way as if it were posted on seo-theory.com and judged on links, text etc.

If a whole site deals with fashion does that not have any impact at all on the way it is ranked if fashion related keywords are used ?

dodito 09.23.07 at 12:37 am

A second question that occurs to me.

What if you have pages grouped together around one “theme”: say “Polar Bears. You have pages about the bear, North Pole, Global Warming etc. etc. I can interlink them with relevant anchor text (so “Global Warming” anchor text pointing to “Poor Polar Bears: victims of Global Warming” page etc. etc.).

I have seen it being pretty effective even for small groups of pages. Isn’t this a way of theming ? Perhaps not a whole site, but making a group of sites part of a single theme, and making that visible in any possible way ?

Or.. would I understand you correctly that it has nothing to do with the *themes* per se, but 1) on page criteria for each separate page (could be any theme) and 2) the fact these pages are linked with anchor text I think is useful (also irrespective of themes in a technical sense) and in a coherent site structure.

In other words.. could I do the same thing, but now for pages dealing with bears, fashion, nuclear power and SEO ? Would it have the same effect on the search engines ? Then “theming” is just relevant to the end user (I find it harder to convince a user to jump from a fashion page to a nuclear power page.. but it IS possible, and I presume that is the point you’re making about the irrelevance of “topical links”: there’s always a theme to be found) ?

I think I have just asked you 3 questions instead of one in this posting.. :-)

Michael Martinez 09.23.07 at 10:03 pm

What you seem to be seeing as “theming” is really just internal linking and group linking that designates which pages are the most important.

Search engines determine trust on the basis of which sites comply with their guidelines over time. They may look at content within certain verticals but algorithmically doing that for the entire Web is impossible.