How Large Web Site Theory works with Set Theory

by Michael Martinez on May 16, 2008

I put my production team to sleep the other day in our weekly Large Web Site Design Theory meeting by drifting into a discussion of Set Theory. Having no time to devise a more intuitively appropriate topic for today, I’ll just share a few thoughts on the topic here.

First of all, let me tell you what a set is. Mathematicians are pretty good about defining “set” consistently. Alas! Alack! Google and Ickipedia have no clue.

My old College Algebra, Calculus, and Set Theory professors defined a set as “any collection of objects together with precise criteria for determining whether (or not) a given object is (included) in the collection”. Whole branches of mathematics are founded upon that concept, and I don’t mean just set theory. In fact, computer science and computer language design owe a great deal to set theory, too.

But then, so does search engine optimization in a way, although we have not formally declared the relationship (at least not that I can recall).

Let’s talk about the Web as a whole (to begin with). A Web document is any document that is accessible through the World Wide Web. That’s your set. Call it the Superset of all Web content. If it’s not accessible through the Web or if it’s not a document, it’s not a Web document.

We can subdivide the Web Superset into millions of smaller sets. We can speak about the set of all Web documents that are relevant to dogs, or the set of all Web documents that link to Web documents that are relevant to dogs. We can speak about the set of all Web documents that are located on a domain (like the pages on SEO Theory, which includes more than just this blog).

We can even sub-divide these sets into smaller sets. So maybe we’ll talk about all Web documents relevant to dogs that were produced by members of the American Kennel Society, or all Web documents relevant to dogs that were made by members of the American Humane Society. Or we can talk about the set of all Web documents relevant to dogs that only I have ever contributed to.

If you confine your view of Web document sets to your personal Web experience, you can begin to see where set theory helps search engine optimization. For example, there are all the Web documents you have ever viewed, contributed to, linked to, linked from, or have been mentioned on. You can subdivide your personal Web document sets into smaller sets such as the Web documents on any given site you have created, or the Web documents on any site you have optimized, etc.

But we can look at Web document sets in other ways, too. For example, for every Web document there is a set that consists of the document and all other Web documents that link to it. Hey, now we’re getting into the part that most SEOs can relate to: links. How many links does a Web document have? How many links does a Web document need?

If you pair a Web document with its inbound linking sources, you have a link profile set. The destination page has to be included in the set because — well, because I arbitrarily decided that must be the case. But it makes sense to include the destination with its linking sources because then you can objectify your linking strategies.

For example, let’s say you have a Web site with 10,000 pages. Not very large by some people’s standards but the typical business site will never acquire that much content. Most people would immediately launch upon a link-building campaign without thinking about where to point to the links or why. After all, if you have 10,000 pages, what are they relevant to?

A large Web site can be an ecommerce site, an archive of articles or discussions, an index of sites, a glossary or other fact resource, or just a random collection of content. Each of these types of sites generally have distinctive navigational features. Glossaries and scientific resources, for example, often have alphabetic navigation systems. Random collection sites may use several navigation systems. eCommerce sites tend to categorize content by products and services, and so forth.

Your Web site navigation should be emphasizing what is most important to you but it’s not likely to emphasize what is most important to me (or other people). For example, Amazon lists more than one book called Understanding Middlee-earth but that one that is most important to me is the book I just linked to (I wrote it).

If your Web site allows other people to create their own content, they may be linking to it with their own priorities in mind. A lot of people do that when they finally set up their “official” personal sites — they tell their visitors where else on the Web people can find their contributions.

If your Web site offers eclectic content other people may link to it with their own priorities in mind as well. So your deep content may earn more high value links than your root URL (which is why it’s important to ensure that your internal navigation doesn’t just use “home” as the anchor text for the root URL).

Each Web document is thus the heart of its own set of Web documents that refer to it (which, technically, could be larger than the set of Web documents that actually link to a document). That is, there is no reason why you cannot embed a link to a page on itself. Many sites actually do this through what I call “lazy navigation” (which I’ve often used myself). Meticulous Web designers may disable self-referencing links but there is no real reason to do so. Not that I would expect any search engine optimization advantage from a self-referencing link, but you should now see why I include Web documents with their linking sources in their link profile sets.

A Web document can be self-referencing and many documents are self-referencing. On the other hand, Web sites must be self-referencing (a Web site is “any collection of interconnected Web documents that function as a complete resource”). A Web site can be part of a larger Web site, it can be part of a network of Web sites, it can be part of a neighborhood of Web sites. A Web site is a set.

As we’ve seen, sets can include other sets. You create new sets by adding or deleting criteria for determining what goes into your sets. For example, you can start with the set of all pages on your Web site and refine that to define the set of all pages on your site with inbound links and further refine that to define the set of all pages on your site that have inbound links from other Web sites.

By defining these types of sets and identifying their members, you can better manage your linking strategies. Linking strategies work best when they have concise, specific goals in mind. Call them discrete linking goals. A discrete linking goal may be something like “increase the frequency with which a Web document is fetched” or “increase the number of referral visitors a Web document receives” or “increase the number of occurrences of a keyword associated with a Web page”.

Discrete goals do not have to be atomic. You can define discrete goals so that they can be subdivided. For example, increasing the number of referral visits a page receives can be subdivided into “receiving referral visitors from site X” and “receiving referral visitors from site Y”. Sites X and Y can be search engines but they don’t have to be.

If your linking strategy has been focused on obtaining site branding links, you’re overlooking valuable resources that help improve your Web site’s visibility. A site branding link points to your root URL, usually with the site name or brand but some people insist on getting “the right anchor text”. This kind of sub-optimal linking may be driven by client priorities (some SEOs are asked just to enhance existing link profiles without engaging in broader strategies).

Sometimes it helps to see where too many links are being pointed. If you group your linking sources by destination, you may obtain better insight into why some of your pages are not working as you expect them too. It’s possible to obtain too many links to the destination as well as too few. It’s also possible to obtain more ineffective links than effective links. That is, if you feel a page has “plenty of inbound links” and you’re not seeing the performance you expected, the most likely explanation is you wasted your time obtaining worthless links.

One of my team members asked me if we want to uniformly distribute inbound links across a Web site. In reply I asked if that looks natural. Does or should an evenly distributed link profile look “natural”? While most people might say “yes” without reservation, I would say “it depends”. What’s a natural link profile for a social media site where most of the members have other sites that they link out from? That is, should most of Facebook’s links point to Facebook.com or should they be pointing to the profiles? Facebook is all about the profiles, is it not? I would expect more uniformity and consistency in the link profile for Facebook than in the link profile for the average Web forum.

There is no universal formula for creating a link profile. You have to understand the type of content and the audience you’re dealing with. You have to measure naturalness in the context of the Web site’s neighborhood (and I use the term “neighborhood” loosely here — don’t confuse it with putative search engine spam neighborhoodliness). Each Web neighborhood develops its own characteristic linking profile trends.

Large Web sites may buck those trends. They may redefine those trends. They may shape the trends by creating the neighborhoods to begin with. The process of birthing natural linking profiles is as ongoing and continuous as the universe’s production of new stars. The World Wide Web acts very much like the universe, filled with its own galaxies, clusters of galaxies, and super clusters. Each galaxy has its brighter stars and dimmer stars.

And, like the universe, the Web works according to natural laws which we are only beginning to understand. But one area where we can apply our theoretical scientific knowledge is in the management of Web information. Set theory helps us organize data for critical and strategic functions. We’ve really been doing this all along. We just never stopped to acknowledge the fact that we’ve been working with sets.

Hopefully that wasn’t quite so boring for you.

{ 0 comments… add one now }