Pros and Cons of Google Advanced Query Operators
Posted by Michael Martinez on July 24, 2008 in Advanced SEO
I noticed a link to SEO Theory from this nice roundup on Google advanced query operators by Lorna Li. Without meaning to steal her thunder, I’d like to offer some comments on popular query operators in a concise roundup of my own. This is something I’ve been thinking about doing for a while.
Preamble - Query Operators Provide Limited Information
There are two principles you should always keep in mind when using advanced query operators for analysis.
- No search engine can tell you anything useful about what another search engine knows
- No query operator will reveal everything a search engine can tell you about its index
As an example, we’re presently doing an extended server log analysis for a client. The client’s site is being crawled more aggressively by one major search engine than others. During our most recent conversation, we discussed the feasibility of evaluating the link profile for one of the less aggressive search engines by looking at links in the more aggressive engine’s results. I had to point out that with the disparity in indexed pages, there was no way we’d learn anything useful about the less aggressive search engine.
When Yahoo! reports 1,000 backlinks and Google reports 10 backlinks, all you know is that Google is showing you a random sampling of the links it considers to be valid. Both Google and Yahoo! tell us they don’t allow all links to pass value, but neither tells us which links DO pass value in their indices, and neither engine can tell us which links they know about together.
When Yahoo! reports X number of indexed pages and Google reports Y number of indexed pages, it doesn’t matter which one knows about more pages than the other. The disparity alone ensures that neither engine is crawling the same backlink set. Hence, whatever one search engine knows about backlinks for a site is useless information for determining what the other search engine knows.
On those rare occasions when Google and Yahoo! seem to know about all the same pages on a site, there is reasonable expectation that they probably have a higher than average percentage of overlap in their backlink sets as well. But this is not a sure thing. After all, the better designed your site architecture is, the easier it is for ALL search engines to crawl and index your site. Hence, with small sites similar cover in two or more search engines doesn’t provide enough support to conclude that they know about all the same backlinks (or credit them in the same way).
If a really large site has very similar indexing in two or more major search engines, it probably has a combination of strong influencing backlinks pointing to a variety of entry points on the large site and well-designed site architecture that facilitates easy and diverse crawl activity. That’s the goal we should be aiming for with all sites that we optimize.
Site: operator - Pros and Cons
Since Google appears to still be placing most Web content in its Supplemental Results Index, the site operator provides limited value. Pages in the Supplemental Index are not fully indexed. Google has said it’s more likely to index rare words on a supplemental page than common words, although they have also indicated several times over the past two years that they are increasing their supplementing indexing technology.
To date I have seen no evidence showing that links in the Supplemental Index pass anchor text. I don’t believe you get much if any PageRank from a link in the Supplemental Index, either. I feel strongly that you need links from the Main Web Index to move a page into the Main Web Index. It’s a rite of passage that, I believe, probably helps ensure a lot of spam and duplicate content stays buried in search results (and in some competitive queries this strategy does seem to work well for Google, but the spam still tends to outrank legitimate content that isn’t as competitive or comprehensive as the large commercial sites that buy thousands of links).
Some SEOs like to run tests where they put nonsense words on pages and search for the nonsense words. You may be able to find a nonsense term on a supplemental page or not. I’ve looked at supplemental pages that did and did not come up for nonsense keywords. Expecting these pages to pass anchor text and other value is naive.
The site: query operator is good for seeing how much Google thinks it knows about your site. If you’re seeing only a fraction of your content in Google’s index that’s an indication of crawl issues, which may be due to technical problems, poor backlink support, or weak site architecture.
You can also use the site: query operator to find out how often Google refreshes its cache for your pages. Matt Cutts has said (I believe) that Google won’t necessarily refresh your cache every time it crawls your page. That’s important to keep in mind because if you clock your cache refreshes you can learn a lot about how much Google values your Web site.
Generally speaking, the more often a page cache is refreshed (based on what I have observed to date), the more likely Google is to index all the words on the page. The less often a page cache is refreshed, the less likely Google is to index all the words on the page. So Google will show you supplemental pages in a site: query but if you search for expressions inside that query you may not find all the pages that contain them.
Site: queries have similar limitations when you’re trying to analyze which links are passing value internally.
Link: operator - pros and cons
People have lost too much faith in Google’s link: operator. It does only report a random sampling of links, but so far I have only observed what appear to be value-passing links in these reports. If all the links reported do pass value, don’t assume that they are the only value-passing links Google knows about. Google does change the results for the link: query operator from time to time. Links that I know are still out there on pages I feel are in the Main Web Index don’t always appear in the link: queries.
When you compare backlink profiles for two sites in Google, you’re getting better data about which site is stronger (in Google’s index) than when you compare backlink profiles in Yahoo! (which does tell you something about those sites’ strengths in Yahoo!). If a Google link: report for site A shows you 15 links and a report for site B shows you 500 links, which site do you think Google sees more value in? My money is on site B, every time.
The more parity you find between two sites in a Google backlink profile, the more likely the sites are to compete through on-page relevance. If you’re trying to decide whether to invade a query on Google, do a backlink check on Google for the top five sites. If any of them has 1,000 links or more in its Google link: report, you have a lot of work cut out for you in most queries.
Google does sometimes report more backlinks than Yahoo!. Remember that Yahoo! and Google do not share data and they do not crawl from the same seed sources. Nor do they crawl the same way. Google can also index more pages than Yahoo!, and Yahoo! can index more pages from Google. There is no reliable method for using one search engine to do link research that applies to another search engine.
allintext: operator - pros and cons
Lorna Li says “It tends to give prominence to documents that contain the keyword at the beginning of the body text.” Rand Fishkin reportedly said it isn’t worth a crap (or something like that) at SMX Advanced 2008.
Ladies and Gentlemen, I give you one of the operators that indicates whether a page may or may not be in the Supplemental Results Index — allintext: and intext: (okay, that’s two operators). I use intext: frequently to analyze how well a page is indexed. I have never seen it give prominence to keywords at the beginning of body text, but with billions of pages in Google’s indexes I’m not surprised to see people observe different apparent behaviors.
If you can find a page using the site: query operator and it doesn’t appear for an intext: query where you use unique expressions, the page is clearly not fully indexed. That doesn’t absolutely mean it’s in the Supplemental Results Index. Google seems to partially index some pages in the Main Web Index and then add more data for them later. You have to observe how a page behaves in the queries over time. If after a month of testing your page is still not showing results for specific text queries, it’s probably Supplemental.
Cache: operator - pros and cons
The cache: has taken on increasing importance in search engine optimization since we lost the Supplemental Result label last year. Personally, I would not discuss how I use this operator or how I get around Google’s limitations, but more and more SEOs are talking about it. You need to understand your content’s cache behaviors, and there are many cache behaviors to understand.
You need to spend about as much time working with search engine cache data (Google is not the only search engine to provide this data) as with any other query operator — perhaps more.
What cache data won’t tell you is why a page behaves the way it does and why the search engine decides to update on that particular date. Nor will cache data tell you much if anything about a stealth site, since they may be cloaking the cache (yes, it’s possible to cloak for cloaking but I’m not going to explain how to do it). Some people simply use the “noarchive” robots meta directive to prevent competitors from looking at cache data but that practice is considered to be suspicious by some people.
Final thoughts
Search engine results analysis has to focus on the most important factors for optimization. You need to know:
- Which pages in your site are indexed
- How often your page data is recached
- How many of your pages pass value through their links
- How many of your pages are receiving value from other pages
In some cases you can (and should) combine query operators to refine your analysis. If rank-checks and backlink profiles are all you’re doing, you’re running slower than the leaders in the field and they are way out ahead of you. You can’t understand a Web site’s performance simply by looking at a handful of targeted queries and running backlink reports on Yahoo!.
The search engines look at hundreds of signals to determine how to crawl, index, and rank sites. You should be looking at hundreds of signals, too.
6 Comments on Pros and Cons of Google Advanced Query Operators
By kinetic on July 24, 2008 at 11:37 am
Why in the majority of your posts there is an reference to what Randy Fishkin is saying? If he is so hot, put him in your SEO legends list with your friend Vandemar..I respect the guy,but like you said he is a newcomer in the SEO.
Does he talk about you in his posts?
Don’t try to catch up, stick to your opinions and beliefs. set the bar high so he is the one trying to catch up with you.
By Michael Martinez on July 24, 2008 at 3:01 pm
A quick site search does indicate that I’ve mentioned Rand on about 10% of the posts. Well, he’s worth mentioning, in my opinion. I’m not out to trash him. It’s just that sometimes he says things I want to follow through on. The comment that various SMX blog reports attributed to Rand was relevant to the topic of this post.
And, actually, yes, Rand HAS talked about me in some posts. No harm done, in my opinion — certainly no harm intended from me.
I haven’t looked at the blogroll in quite a while. I know we took down some popular sites. At the time I was thinking of using the blogroll to promote some alternative blogs but that plan fell through. We’ll probably be making some changes in the next week or two anyway, so I’ll devote some time to figuring out what to do with the blogroll. I think it’s too long to be useful.
By lornali on July 27, 2008 at 3:13 pm
Hi there Michael,
No thunder stolen! In my research on the best, free, quick & dirty way to get a read on a website I thought I’d compile together the more relevant info on Google search operators. And I’m really glad I inspired you to finally kick out this post, because you have answered many of the questions that came up for me in my research process:
How “accurate” are these operators?
Are they telling me the information I truly need?
If not, what tools do I need to be using, must I pay for them, and if so, how much?
Yes, I agree, their usefulness is limited, and accuracy highly debatable. What do leaders in the field use to gather an accurate read on:
1. Which pages in your site are indexed
2. How often your page data is recached
3. How many of your pages pass value through their links
4. How many of your pages are receiving value from other pages
You say:
In some cases you can (and should) combine query operators to refine your analysis. If rank-checks and backlink profiles are all you’re doing, you’re running slower than the leaders in the field and they are way out ahead of you. You can’t understand a Web site’s performance simply by looking at a handful of targeted queries and running backlink reports on Yahoo!.
The search engines look at hundreds of signals to determine how to crawl, index, and rank sites. You should be looking at hundreds of signals, too.
There is so much SEO information on the web, the value of much of it is debatable, and as a relative newcomer to the industry, it’s hard for me to sort through the clutter.
If I had one day to devote to this and no budget, what are your most recommended tools (and methodology on using them) that a solo SEO can use for quick, meaningful site diagnostic?
Cheers,
Lorna
By Michael Martinez on July 28, 2008 at 8:33 am
I had only one day and no budget I would probably leave the free tools alone until last, using a combination of advanced query searches to winnow out the most prominent pages.
One way to do this is to start with a site: query across all the major search engines. The pages that appear most often are the ones with the best inbound linkage.
Then refine those site queries to rank the internal pages by site name, primary keyword, etc. If the same pages rank consistently high the optimization may be pretty solid.
Then I would start digging deeper into the content to see what was being reported as indexed but not returned for unique on-page text queries. That would tell you which pages are not well trusted, full indexed, or likely to help much.
When you’re ready to look at backlink profiles, see if there is a correlation between number of reported links and number of reported indexed pages. For example, suppose you have a 1,000 page site and you get the following data:
Ask indexes 400 pages.
Google indexes 600 pages.
Live indexes 300 pages.
Yahoo! indexed 750 pages.
Ask does not report backlinks, but you may be able to look for references to a specific site within Ask’s index.
Google reports a randomly selected 25 links for the site.
Live reports 50 links.
Yahoo! reports 200 links.
If you compare the site to other sites of similar size, design, content, and age, how closely do their backlink profiles resemble your own site’s backlink profile?
Do you get something different (like Google reporting hundreds of links and Yahoo! reporting only a few)?
Where you find links, how much overlap can you document between the major search engines’ link profiles? The more often a linking source and its destination (on your site) appears across the major search engines, the more likely it’s a very trusted linking source that passes value.
There are almost always linking sources that one search engine indexes but which are not indexed in another. An old SEO technique involves creating crawl pages for the links from search engine A (where they are indexed) to help search engine B (which does not index them) find and trust those links.
It may not be as simple as just creating crawl pages. You need to show search engine B that your crawl pages have content its algorithm should index (so by “crawl pages” I’m not talking about relatively empty pages with lists of links).
You can identify a lot of work that needs to be done, given one day, no money, and only access to the major search engines’ advanced query operators.
By lornali on July 28, 2008 at 11:39 am
Thanks! This helps!
One more question: If one had a budget (strong, but not extravagant), what are your top recommended software packages / tools / subscriptions you would recommend for SEOs?
Cheers,
Lorna
By Michael Martinez on July 28, 2008 at 12:31 pm
I’m not in a position to make public endorsements of other SEO products. While we don’t sell tools or subscriptions, neither do we endorse them (beyond an occasional rare review or comment about free tools).
Generally speaking, if I were going to spend money, I would want to spend it on something that stores data and permits me to delve into the depths of that data through multiple reporting functions. Historical data is more valuable than any other type of data an SEO tool might collect or provide.
Comment
Log in or Register to post a comment.