HTML Sitemap Design and Theory – Fundamental Basic Principles of HTML Sitemap Design

by Michael Martinez on June 10, 2008

The most important part of any large Web site is the HTML sitemap. Both search engines and people use HTML sitemaps, and even Webmasters can find uses for them as they help us keep straight just where all the content has been placed.

An HTML sitemap should have the simplest, barest, non-flashiest design of your site. It can still use the site template for look-and-feel but the content of the page(s) is nothing but links. You can add descriptive text if you wish but this is one situation where text can get in the way.

People use HTML sitemaps to find deeper content when on-site navigation and site search fail them (and both on-site navigation and site search fail all too often). The HTML sitemap is the last step between a determined visitor and unsatisfied departure. If someone abandons your HTML sitemap page, either you don’t have the content they are seeking or your site really sucks. SUCKS REALLY BAD.

Ideally you don’t want people to have to look at your sitemap, but it needs to be there, ready and waiting, usable and efficient, for those off days when everything else fails your visitor. And that’s why simple HTML ordered list structures are all you need to manage the links on your sitemap pages.

Now, Google has been advising people for years to limit the number of links on HTML sitemap pages for usability. It’s true that the human mind can only handle so many links at one time. In fact, intrapage navigation efficiency tops out when the user is offered betwen 8 and 12 items to pick from.

If you have 5,000 pages of content on your Web site, then by Google’s recommendation you need about 50 HTML sitemap pages. Of course, as no one wants to paginate through 50 link lists, you have to design a whole new structure for your HTML sitemap.

In fact, since visitors often turn to HTML sitemaps to avoid using cranky on-site navigation, it’s best (for complex HTML sitemap sections) to use an entirely different hierarchical structure for your HTML site. Throw your Web designer a bone and tell him to go fix the front page or something while you implement a classic tree structure for your 50 HTML sitemap pages. Divide the pages into categories and create a root page that lists all the categories. Allow each of your 50 HTML sitemap pages to use their own internal navigation system so that visitors can get back to the root or to any category quickly.

How many categories should you have? That depends on how your content is organized and what it’s about. I would want no fewer than 5 HTML sitemap pages in any category. Every HTML sitemap page should link to its category sibling pages, its category parent, the other category parent pages, and the HTML sitemap root page.

If we assume you have 10 categories with 5 HTML sitemap pages in each category, you’ll need 10 category parent pages and a root page for a total of 61 HTML sitemap pages. Do you HAVE to do it that way? Absolutely not. You can make the first page in each category the root for its category (you should adjust the link distribution so your pages have about the same number of links overall). And you can find other compromises to suit your aesthetic or volume-limiting needs.

The point of this outline is to underscore the need for a tight, easily crawlable structure. And in this context I use “crawlable” for both humans and machines. People need to be able to navigate through your HTML sitemap quickly. On some sites you’ll want to organize your categories by dates, numbers, or alphabet. On other sites you’ll want to organize your categories by topic, author, site theme (for mega sites that host multiple sites), etc. You MUST organize the categories according to how your content is arranged, not according to some theoretical “best performance structure”.

Now, while I said that text gets in the way on HTML sitemap pages, the truth is that you need to leverage your text carefully. In fact, HTML sitemap pages allow you to step outside the constricting boundaries of your on-site navigation to use fully descriptive and informative — even compelling — anchor text for your internal links. Use the anchor text to tell people what each page is about. Or use it to reinforce the page titles (which could also be the page header tags). Think about what people need to know about a page so that they can pick it out of a list of links quickly.

Branding anchor text is not good unless you have some very well-promoted, well-managed brands. General Foods can get away with putting brand names in its HTML sitemap anchor text if it wants. Most business sites cannot afford that luxury. That is, your internal anchor text needs to focus on value to the visitor, not value to the business (or brand).

If you have a site with 100,000 pages of content the odds are pretty good you won’t find many people searching through your HTML sitemap page by page. Even if you double up the number of internal links per page you’ll still end up with 500 HTML sitemap pages plus whatever overhead is required to organize them. This kind of HTML sitemap structure can be expanded with its own dedicated SITEMAP SEARCH tool (that only searches the content on indexed HTML sitemap pages) AND you can afford to include extra text with each listing to give the SITEMAP SEARCH tool something to chew on.

An alternative to creating a massive HTML sitemap section for truly large sites is to create multiple small HTML sitemap sections. You can do this in a variety of ways regardless of how crawable your overall content structure is. Huge blog sites, for examples, that have large archives of posts from multiple authors, can benefit from alternative HTML sitemap systems that are broken out by year, category, author, or some other criteria. On a site like a blog, where duplicate content occurs naturally and often, you can use your HTML sitemap strategy to effectively promote your canonical pages, and that is sufficient reason alone for creating complex HTML sitemap structures on large blog sites with tons of duplicate content.

There’s plenty more to be said about HTML sitemaps, and I’ll come back to this topic again in the future.

{ 2 comments… read them below or add one }

Kaus 06.11.08 at 8:05 am

Thank you so much for this post. I posted elsewhere asking for help on this topic just yesterday. We are in the process of building a fairly decent sized site. Right now we are only hitting the 100 page mark but with a few writers on hand, this will be expanding quickly and I wanted a sitemap that wasn’t going to make the visitor go blind with a swarm of links

Thanks Michael :)

Traffics Pain 10.10.08 at 11:26 am

I tend lately to use a simple html sitemap that list every page on the site. Even if this does go over the 150ish links. If Google n Co dont read that many links I dont really care so long as the user can still navigate it correctly. I often will read forum posts like this one fully through with 100 comments so I know users will certainly continue to scan a page downwards so long as it follows thier expectations so to speak.

On top of this I will often also link to a ‘most popular pages’ sitemap that will be broken down into, well, obviously the most important/requested pages.