The Physics of SEO Theory
Posted by Michael Martinez on March 23, 2007 in SEO Theory
Search engine optimization often reminds me of the physical universe and the various laws that we believe make the universe work. Some of the most well-known laws include the Laws of Conservation, the Law of Causality, the Determinism Principle, and the Uncertainty Principle.
Determinism says that if you know everything about a closed system at an arbitrary point in time, you can predict whatever that closed system will do. An example of a closed system in search engine optimization is the three interlinked pages illustration you often find in PageRank explanations.
Page A links to Page B, Page B links to Page C, and Page C links to Page A. Each page is assigned a PageRank of ~.33 (1/3, actually). Very boring. So then we shake things up and say, well, let’s have Page B and Page C link to Page A, and Page A links back to both Page B and Page C. Page A gets 2/3 of the PageRank and Page B gets 1/6 and Page C gets 1/6 of the PageRank. This kind of simplistic diagramming is what led legions of SEOs to believe in The Preservation Principle.
The Preservation Principle asserts that any Web site can preserve a majority of its assigned PageRank. Another way to explain it is that the more pages you link out to, the less PageRank you retain in your Web site. The problem with the Preservation Principle is that it contradicts the definition of PageRank.
PageRank is the (estimated) probability of an unbiased surfer finding any given page by randomly clicking on links. What happens when the surfer clicks through to a page that has no outbound links? The surfer is assumed to start over at another random page. Hence, if you have a page that does not link out, all of its PageRank must be allocated to all other Web pages.
i.e., You cannot preserve PageRank regardless of how hard you try. In fact, PageRank remains constant. The sum of all PageRanks equals 1. In order to allocate PageRank across an ever growing collection of pages, a search engine must start each PageRank calculation process by allocating smaller amounts of “starter PageRank” than in previous calculation processes.
Conversely, if you reduce the collection of documents across which you allocate (estimated) PageRank, you start each page with a larger “starter PageRank” than when there were more documents in the collection.
Call that the Principle of Conservation of PageRank.
In Physics, the Law of Causality says that if event A must precede event B in order for event B to happen, then event B has no effect on event A. Event A is the cause and event B is the effect.
In search engine optimization, Causality is illustrated by the Principle of Sequential Processing which states that a page must be processed in a sequence of steps before it appears in search engine results. The page must be fetced by a search engine, parsed (taken apart), and classified (its parts need to be indexed). These steps cannot occur in any other sequence.
In Physics, the Uncertainty Principle tells us that we cannot know both where a particle is and its velocity (speed and direction) at the same time. The more accurately we know a particle’s position at any point in time, the less accurately we know its velocity; and the more accurately we know a particle’s velocity at a point in time, the less accurately we know its position.
In search engine optimization, the Uncertainty Principle tells us that we cannot know both a page’s state of indexing and its relevance to any given set of queries. That is, because search engines are constantly recrawling and reindexing the Web, the more queries we document a page’s relevance to, the less we know about how well indexed the page is.
In other words, the more queries for which a given page is relevant, the more information there must be about that page. If we can see the queries we cannot see the information about the page. On the other hand, the fewer queries for which a given page is relevant, the less information there must be about that page.
For any page with a large amount of information, our knowledge of that information is small relative to our knowledge about the page’s set of relevant queries. For any page with a small amount of information, our knowledge of that page’s relevant queries is small relative to our knowledge about the page’s information.
There should be a fuzzy point where our knowledge of both the information regarding a page and the number of queries to which it is relevant are approximately equal. This is the optimum point of understanding for an SEO: you balance your knowledge of what the page is relevant to with your knowledge of why it is relevant for that set of queries.
There is something akin to the force of gravity in a hyperlink network. Mike Grehan helped popularize one aspect of this concept with his “Filthy Linking Rich” article in October 2004. Jon Kleinberg also touched upon the principle in his “Authoritative Sources In A Hyperlinked Environment” paper (published in 1999).
The Filthy Linking Rich principle tells us that given a choice between a well-linked Web page and poorly-linked Web page, any new Web page is more likely to link to the well-linked Web page. Kleinberg’s authorities and hubs concept tells us that the more authoritative a page is, the more hubs that will point to it.
SEOs, trying to reconcile these principles, find themselves dealing with a causality issue. Which comes first: the authority of the page or the connectivity that signifies the authority? In other words, why should anyone link to any Web page?
Authority is based on communal concensus. That is, it expects that a moderate level of peer-review will recognize the quality of a Web page and therefore lead to extensive citation (hyperlinking), thus producing the connectivity that algorithmically denotes a recognition of authority.
Although causality is maintained by this theory, its flaw is that there is no peer review on the Web. People link for a variety of reasons, many of them having nothing to do with the quality of a page. That is, social link theory is quite old and can be stated thus: the more relevant to the interests of a community a document is, the more interest the community shows in the document.
If you create something hot, people will tell other people they know about what you created. You could call this the Kevin Bacon Principle because if each person you know tells all his/her acquaintances about your cool Web page, Kevin Bacon will eventually hear about it from an associate. This principle is only relevant to Kevin Bacon because he was the focal point in the Six Degrees of Kevin Bacon game, a Web site that found six or fewer connections between actor Kevin Bacon and many other people in the film industry.
On the Web, documents only need to be authoritative enough to impress any idiot with the capability of sending a link to any other idiot who is at least as easily impressed as the first idiot. That is, given a large enough population of interconnected naive people, the most biased, misinformative document conceivable for any topic can become the most authoritative document in that topic.
How does this happen? Call it the First Visibility Principle. The first document that crosses the Idiot Threshold (the point where enough idiots pass links around to other idiiots to create momentum) becomes the first authority on a topic. Other, more correct documents may appear later, equally impress the idiots and get many links. But the damage is already done.
Once a document becomes well-linked it accrues links at an increasing rate in part because idiots are easily convinced by the enthusiasm of other idiots. If idiot C learns that idiots A and B recommend a document, idiot C will also recommend the document.
I use the term “idiot” figuratively because linkage is created by people of all levels of knowledge and capacities of reason, down to those people who posses only the minimum knowledge and reasoning ability required to create hypertext links. Smarter people will follow the crowd, so that makes them no better than idiots.
To avoid being an idiot, you have to be a skeptic. In our Web universe, there are only two kinds of people: idiots who link to the first thing that tickles their fancy and skeptics who link only in the most discerning fashion possible. However, skepticism requires significanty more effort than unqualified enthusiasm. Hence, our Web universe is always populated by more idiots than skeptics and therefore the idiots’ will most often outvote the skeptics.
In other words, all you have to do is create the first Web document relevant to a hot topic, create visibility for that document, and then watch the links poor in. You won’t have to worry about quality as long as your quality is good enough to impress any idiot.
Link gravity develops around Web sites that become so popular they create their own network of satellite sites that benefit from being linked to by the popular sites. These satellite sites tend to reinforce the main linking site, driving more links to the main site. It’s as much like the formation of a star system complete with sun and planets.
The larger the system of Web documents, the more outside Web documents will link to it because its increasing visibility gives the system greater link mass. When you reached a sufficient link mass, however, something interesting happens: the community fragments and splits up into smaller communities.
Now the process begins all over again in a dozen communities because each of the old satellite sites now has its own little community of satellite sites.
The fact that pages of very high quality do occasionally appear in no way impacts the principles I’ve outlined here. Truly high quality content may attract a lot of links or it may go largely unnoticed.
Call that the Wikipedia Effect, where a Web site of very questionable value has accrued extensive link mass thanks to the votes of many idiots who did not know better. Better, more accurate/informative documents are challenged to meet the standard for “authority” set by the earlier page.
I’ll come back to this another time.
Comment
Log in or Register to post a comment.