SEO Math: Axioms for Search Analysis

by Michael Martinez on December 9, 2008

It’s virtually impossible to find decent tutorials on math and statistics you can use for search results analysis. Nearly all SEO-related math articles seem to focus on reverse-engineering Google’s PageRank, explaining Latent Semantic Analysis, or analyzing click-through data from AOL.

None of those topics are really useful for search engine optimization from a Web marketer’s perspective. People have developed crude models for analyzing keyword trends but a lot of that work now seems to be driven by PPC analysis (which offers only limited insight into organic keyword analysis).

While virtually everything in search engine optimization can be quantified and measured in some way, there is still plenty of room for growth in our theory of search analytics. What follows is an introduction to some of the math I’ve developed for search analysis over the years.

Definitions

Let’s start with a few definitions (some of which you’ll find in our SEO glossary). We’re discussing redesigning the SEO glossary so it may be a while before all these terms are added to it.

Notation – Different branches of mathematics and statistics use standardized notational systems. As I’m not sure where search analysis would fit into the mathematical disciplines, I have developed my own peculiar notional system that also happens to be easy to type into a blog.

For certain value classes, I use a subscript (example: Es) to denote a specific type of value within the class. The y-subscript for any class represents the “mother valuation” or primary valuation. Hence, I denote Naturality as Ny even though there is no other valuation in that particular class.

See the end of this article for how the primary valuations of the three main classes are used in the formula notational form of the Theorem of Search Engine Optimization.

Search listing – A search listing is a reference to any document of any type in the search results provided by a search engine. Even URL-only listings are search listings. It doesn’t matter if the listing is a Web document, something else, fully indexed, etc.

Query Result – aka search result, a query result consists of 0 to 1,000 listings provided by a search engine in response to a user query. The 1,000 listing limit is arbitrarily set by search engines.

Query Space – A query space is a group of queries that are similar to each other and the Web content that is relevant to those queries. For example, “seo theory”, “theory of seo”, “search engine optimization theory”, “search marketing theory”, etc. are all related to each other. The query space for SEO theory is relatively small and undeveloped (although I have noticed several new blogs are now attempting to crash the space).

Naturality – 1. A search listing is natural if and only if it has not been optimized for the specific query result in which it appears. That is, if you optimize a page for keyword 1 and the page appears in search results for keyword 2 (where keyword 2 is in a separate query space), then the page is a natural listing for keyword 2 (but only for keyword 2).

2. Naturality thus refers to the extent to which the listings for a search result are natural (as opposed to having been optimized for that query space).

Transparency – 1. A search listing is transparent if it is clearly optimized for placement in search results to the casual observer. NOTE: A “casual observer” must be able to distinguish between optimized and unoptimized content on some minimal level.

2. Transparency thus refers to the extent to which optimized listings for a search result are obviously optimized for a query space.

Opacity – 1. A search listing is opaque if it is not obviously optimized for placement in search results to the casual observer.

2. Opacity or Search Opacity refers to the extent to which optimized listings for a search results are NOT obviously optimized for a query space.

3. Page Opacity refers to the extent to which a Web document is NOT obviously optimized for a query space.

Simple Formulas and Functions for Naturality and Optimization

Although a Query Result may include up to 1,000 listings, many of them show far fewer. I use a term I call Total Search Listings (denoted Ttl) to refer to the actual number of listings in a search result. Calculations look different if you assume a maximum of 1,000 listings rather than count the actual listings for a search result.

The Naturality (denoted Ny) of a search result is defined by the ratio of Natural search listings (denoted Nsl) to Total search listings provided for the query. e.g., Ny = Nsl / Ttl.

A search result is Perfectly Natural or possesses Perfect Naturality if and only if Ny = 1. A search result is Perfectly Unnatural (or Perfectly Optimized) if and only if Ny = 0. Ny is undefined if Ttl = 0.

The Transparency (denoted Ty) of a search result is defined by the ratio of Transparent search listings (denoted Trl) to Total search listings provided for the query. e.g., Ty = Trl / Ttl.

A search result is Perfectly Transparent or possesses Perfect Transparency if and only if Ty = 1. Ty is undefined if Ttl = 0.

The Opacity (denoted Oy) of a search result is defined by the ratio of Opaque search listings (denoted Opl) to Total search listings provided for the query. e.g., Oy = Opl / Ttl.

A search result is Perfectly Opaque or possesses Perfect Opacity if and only if Oy = 1. Oy is undefined if Ttl = 0.

A search result is Completely Optimized when Ty + Oy = 1 (that is, the search result is Perfectly Unnatural as defined above). Conversely, a search result is only Partially Optimized when 1 > Ty + Oy > 0.

A search result favors or tends toward Opacity whenever Ty, Oy > 0 and Oy > Ty.

A search result favors or tends toward Transparency whenever Ty, Oy > 0 and Ty > Oy.

A search result favors or tends toward Naturality whenever Ny, (Ty | Oy) > 0 and Ny > Ty + Oy.

A query space is uniform or homogenized if and only if all of its search results have equal values for Ny, Ty, and Oy.

The weight (denoted Rw) of a search result is the difference between the sum of its (Transparent and Opaque search listings) and its Natural search listings. e.g., Rw = (Ty + Oy) – Ny. A search result is heavily weighted when Rw > 0, lightly weighted when Rw < 0, and balanced when Rw = 0.

Discussion

The basic formula is pretty simple. You treat each search result as a set of listings and divide them into optimized and natural listings. From there you just determine what the ratio or percentage of listings in the result is natural (or not). You can refine the analysis through sub-categorization of your optimized listings.

Optimzied listings are transparent if they obviously match the query expression in their page title, page URL, and meta description. There are four degrees of optimal transparency, depending on how many of those three factors are utilized to match the keyword expression. Your optimal transparency equals zero if the listing does not include the query expression in any of the three elements.

A listing is opaque if its corresponding Web document is not directly relevant to the query but links (or other off-page factors) are used to include the document in the search result. The question of intent must be addressed, however. An opaque listing is opaque if and only if it has been targeted for the purpose of including it in a specific search result. Otherwise, the listing is natural.

The question immediately arises concerning the detection of opaque listings. How do you do that in a day when most SEOs are dropping links right and left without regard for their ability to impact search results? The short answer, I think, is to simply assume that all oddly placed irrelevant content is probably opaque. That is, if you find a Web document ranking in the top ten results for an expression that doesn’t occur in the document’s content, and if you can find links pointing to that document which use the query’s expression as their anchor text, you can treat the document as an opaque listing.

I would not run out and update my spam detection software on the basis of that rule of thumb. It is only convenient to use it for analyzing search results because if you detect a trend of probably opaque listings your calculation for the query’s opacity should produce a reasonable estimate.

The Theory of Search Engine Optimization states that we use or apply “algorithms to influence the predictable content and quality of search engine results according to” arbitrary criteria. We can measure the degree to which search engine optimization influences search results on an approximate basis.

The Theorem of Search Engine Optimization tells us that “achieving optimal performance from search engine results diminishes the naturality of the search results.” In other words, the more search listings that are optimized, the less natural the search result tends to be.

And we have the math to show how that works:

Ny = 1 – (Ty + Oy)

{ 3 comments… read them below or add one }

Thomas Schmitz 12.09.08 at 9:13 am

Your first paragraph describes a bit of something I think is a lacking in Internet marketing, statistics.

This is a business with a lot of uncertainty. We should be obsessive when it comes to identifying outliers then eliminating or understanding them. I rarely see anything more complex than an average where I’d expect to see figures like t-ratios, z-scores and degrees of freedom.

p.francismabry 12.09.08 at 1:49 pm

Michael
Brilliant Piece today and greatly appreciated. I would call the Mathematical Discipline “Applied Statistics”. Seems to me an excellent model and description of what you’ve Published today. This definitely pushes the SEO Theory way forward. Well Done! As someone who has used Applied Statistics to Model the Futures Market for a living and to see it in SEO Theory, well, you made my day.

FrancisRaymond 12.10.08 at 6:54 am

Oh how I wish I understood all those mathematical equations… Thanks for bringing up that theoretical side of SEO, this doesn’t get discussed much in the blogs.