<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Google Throttles Blog Search Indexing</title>
	<atom:link href="http://www.seo-theory.com/2009/02/12/google-throttles-blog-search-indexing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.seo-theory.com/2009/02/12/google-throttles-blog-search-indexing/</link>
	<description>Algorithm analysis, Web community relationship analysis, SEO practices and techniques, industry news, etc.</description>
	<lastBuildDate>Mon, 15 Mar 2010 10:48:35 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Michael Martinez</title>
		<link>http://www.seo-theory.com/2009/02/12/google-throttles-blog-search-indexing/comment-page-1/#comment-1449</link>
		<dc:creator>Michael Martinez</dc:creator>
		<pubDate>Fri, 13 Feb 2009 17:47:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.seo-theory.com/?p=1409#comment-1449</guid>
		<description>Jeremy, thanks for the timely reply and the explanations.  I do see some improvement although I have not checked out all the queries I normally monitor.  I watch many blogs in Blogsearch and the overall trend has been for delays in seeing their content appear in Blogsearch.

I&#039;m still not happy with the change in the default behavior, but I realize I&#039;m just one user out of millions.  I appreciate your responsiveness to the issues I&#039;ve been able to document.</description>
		<content:encoded><![CDATA[<p>Jeremy, thanks for the timely reply and the explanations.  I do see some improvement although I have not checked out all the queries I normally monitor.  I watch many blogs in Blogsearch and the overall trend has been for delays in seeing their content appear in Blogsearch.</p>
<p>I&#8217;m still not happy with the change in the default behavior, but I realize I&#8217;m just one user out of millions.  I appreciate your responsiveness to the issues I&#8217;ve been able to document.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jhylton</title>
		<link>http://www.seo-theory.com/2009/02/12/google-throttles-blog-search-indexing/comment-page-1/#comment-1448</link>
		<dc:creator>jhylton</dc:creator>
		<pubDate>Fri, 13 Feb 2009 16:29:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.seo-theory.com/?p=1409#comment-1448</guid>
		<description>Michael,

Thanks for you feedback on blogsearch.  

I read your post about two hours after you published it, and it was
already in the our index.  (I checked our backend systems.  It looks
like it took almost eight minutes for us to index it after receiving
your ping.  I&#039;m not sure why it took so long to get indexed.)

I tried the same queries you did and saw the same problems.  There
were a few different causes for these problems.  One simple problem is
that searchenginejournal.com was classified as a news site, so we
weren&#039;t including it.  That&#039;s just a mistake, and I&#039;ll make sure it
gets fixed today.

The problem with the other queries is a little more subtle.  We have
some algoritms that attempt to eliminate very similar results, which
can be fairly aggressive at times.  If you get to the last page of
results, you&#039;ll sometimes see a message that says, &quot;We have omitted
some entries very similar to the NNN already displayed.&quot;  There is a
following the message that repeats the query with an extra filter=0
CGI param, which shows all the results.

For [site:seo-theory.com] and [site:seroundtable.com], we omitted some
results because we thought they were too similar.  The results are
filtered on a per-query basis, so posts will still show up for other
queries.  You can see the difference by adding the filter=0 param
yourself:

&lt;a href=&quot;http://blogsearch.google.com/blogsearch?q=site:seroundtable.com&quot; rel=&quot;nofollow&quot;&gt;[site:seroundtable.com]&lt;/a&gt;

&lt;a href=&quot;http://blogsearch.google.com/blogsearch?q=site:seroundtable.com&amp;filter=0&quot; rel=&quot;nofollow&quot;&gt;[site:seroundtable.com] with filter=0&lt;/a&gt;

We were already planning to disable the duplicate suppression
algorithm when you restrict your query to a single site or blog.  I
think that change will be live in a week or two.

I didn&#039;t get a chance to look at the Sphinn query yesterday, but it
looks okay today.  I don&#039;t know what might have gone wrong when you
ran your query yesterday.  A variety of transient problems might have
caused us to briefly serve stale results, but I think they are rare.
It&#039;s not unusual for a particular blog to be slow, either, and then
Googlebot may take longer than normal to crawl the new posts.

Jeremy Hylton
Google</description>
		<content:encoded><![CDATA[<p>Michael,</p>
<p>Thanks for you feedback on blogsearch.  </p>
<p>I read your post about two hours after you published it, and it was<br />
already in the our index.  (I checked our backend systems.  It looks<br />
like it took almost eight minutes for us to index it after receiving<br />
your ping.  I&#8217;m not sure why it took so long to get indexed.)</p>
<p>I tried the same queries you did and saw the same problems.  There<br />
were a few different causes for these problems.  One simple problem is<br />
that searchenginejournal.com was classified as a news site, so we<br />
weren&#8217;t including it.  That&#8217;s just a mistake, and I&#8217;ll make sure it<br />
gets fixed today.</p>
<p>The problem with the other queries is a little more subtle.  We have<br />
some algoritms that attempt to eliminate very similar results, which<br />
can be fairly aggressive at times.  If you get to the last page of<br />
results, you&#8217;ll sometimes see a message that says, &#8220;We have omitted<br />
some entries very similar to the NNN already displayed.&#8221;  There is a<br />
following the message that repeats the query with an extra filter=0<br />
CGI param, which shows all the results.</p>
<p>For [site:seo-theory.com] and [site:seroundtable.com], we omitted some<br />
results because we thought they were too similar.  The results are<br />
filtered on a per-query basis, so posts will still show up for other<br />
queries.  You can see the difference by adding the filter=0 param<br />
yourself:</p>
<p><a href="http://blogsearch.google.com/blogsearch?q=site:seroundtable.com" rel="nofollow">[site:seroundtable.com]</a></p>
<p><a href="http://blogsearch.google.com/blogsearch?q=site:seroundtable.com&amp;filter=0" rel="nofollow">[site:seroundtable.com] with filter=0</a></p>
<p>We were already planning to disable the duplicate suppression<br />
algorithm when you restrict your query to a single site or blog.  I<br />
think that change will be live in a week or two.</p>
<p>I didn&#8217;t get a chance to look at the Sphinn query yesterday, but it<br />
looks okay today.  I don&#8217;t know what might have gone wrong when you<br />
ran your query yesterday.  A variety of transient problems might have<br />
caused us to briefly serve stale results, but I think they are rare.<br />
It&#8217;s not unusual for a particular blog to be slow, either, and then<br />
Googlebot may take longer than normal to crawl the new posts.</p>
<p>Jeremy Hylton<br />
Google</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Martinez</title>
		<link>http://www.seo-theory.com/2009/02/12/google-throttles-blog-search-indexing/comment-page-1/#comment-1447</link>
		<dc:creator>Michael Martinez</dc:creator>
		<pubDate>Fri, 13 Feb 2009 05:54:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.seo-theory.com/?p=1409#comment-1447</guid>
		<description>I think it will help some people having crawl issues.  I think the SEO community will also find ways to make up nonsense about the tag.  And maybe there is a small chance someone will discover an unforeseen use for it.</description>
		<content:encoded><![CDATA[<p>I think it will help some people having crawl issues.  I think the SEO community will also find ways to make up nonsense about the tag.  And maybe there is a small chance someone will discover an unforeseen use for it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olmei</title>
		<link>http://www.seo-theory.com/2009/02/12/google-throttles-blog-search-indexing/comment-page-1/#comment-1446</link>
		<dc:creator>olmei</dc:creator>
		<pubDate>Fri, 13 Feb 2009 03:29:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.seo-theory.com/?p=1409#comment-1446</guid>
		<description>Michael,
Not exactly related to your article but I&#039;m currently in a heavier mode of research and data accumulation hell than usual and am itching to know your initial take on what the effect the forth coming &#039;Canonical Concert&#039; of msn, google and yahoo might have.

Feel free to expand heavily with respect to the content repetition theme (which I also utilize and enjoy your take on),  in copy, urls, etc...

GOOD ARTICLE by the way. 
Though next time consider expanding your screen shots so I can better view your programs-of-choice icons.

Mike</description>
		<content:encoded><![CDATA[<p>Michael,<br />
Not exactly related to your article but I&#8217;m currently in a heavier mode of research and data accumulation hell than usual and am itching to know your initial take on what the effect the forth coming &#8216;Canonical Concert&#8217; of msn, google and yahoo might have.</p>
<p>Feel free to expand heavily with respect to the content repetition theme (which I also utilize and enjoy your take on),  in copy, urls, etc&#8230;</p>
<p>GOOD ARTICLE by the way.<br />
Though next time consider expanding your screen shots so I can better view your programs-of-choice icons.</p>
<p>Mike</p>
]]></content:encoded>
	</item>
</channel>
</rss>
