Google shares new information about PageRank

by Michael Martinez on July 25, 2008

Just when I thought it was safe to go on vacation, all Hell breaks loose across the Web. We’re having registration issues at SF-Fandom so they have been temporarily deactivated until someone with more savvy than me can look at the problem. In the mean time, people who want to register at SF-Fandom should contact forum admin Stripe. She’ll sign you in manually.

Meanwhile, back in the SERPs, Google seems to have started rolling out almost fresh data again after a freeze of 1-3 weeks (depending on various factors I’m not really qualified to talk about, Google updates information for some sites immediately, hourly, daily, weekly, monthly, …).

Then Danny Sullivan got his hackles up because Google has proudly announced they have found 1,000,000,000,000+ unique URLs that they don’t feel like indexing (that is, they only index part of that content). Okaaay…then Danny referred us to this post about the 1 trillion URLs in which they said (and I quote):

To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google’s index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it’d be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

Emphasis is mine.

Did they just say they calculate PageRank several times a day? Yeah, that Toolbar PR data that Matt Cutts just announced is not going to do you PR-lovers much good. But have fun pretending you have new data for a useful metric.

Meanwhile, scattered throughout Matt’s interesting post are several even more interesting comments (and I shall quote them for you):

  1. “…I’m expecting that also in the next few days that we’ll be expiring some older penalties on websites.”
  2. “…normally we do a new push of toolbar PageRanks every 3-4 months.”
  3. “…the new toolbar PageRanks shouldn’t have anything to do with whether you had a page that wasn’t crawled as much for a short time.”
  4. “… no update to the way that new penalties are assessed; just the expiration of some older ones.”
  5. “…our internal PageRank computations have many more degrees of resolution than the 0-10 values shown in the toolbar.”
  6. “… The last time I checked, many many more users turned on the PageRank display than there are site owners. The PageRank display is actually a popular feature, as it turns out.”
  7. Regarding Webmaster Tools’ Low, Medium, High…PageRank valuations: “…At Google you’ve got full access to the raw values, so I rarely look at the truncated histograms of stuff.”
  8. On increasing the Toolbar PR scale from 10 to 100: “…I think folks here are happy with the 0..10 scale.”
  9. “…the toolbar PageRank display is not linear. It’s more than twice as hard to get a PR6 compared to a PR3, for example.”

Okay, so questions the SEO world would be better off asking include:

  1. Do all Toolbar PR data updates coincide with penalty expirations?
  2. Is (internal) PageRank really recalculated for the entire (indexed) Web every day?
  3. How does Google assess what constitutes a “site” owner? Do sub-domain site/host account holders count as “site” owners or are we just talking about domain owners?
  4. Was all this PageRank discussion in any way inspired by my recently published History of PageRank (technically, it was titled a brief history)?

Don’t forget to check out Gary Lee’s PageRank Timeline based on my BRIEF history of PageRank.

Okay, I have to get to bed. I’m speaking at a small conference in the morning. Have a great weekend!

{ 1 comment… read it below or add one }

Justin-Goldberg 09.02.08 at 11:00 am

I wish they’d convert large pdf’s to html, and not just the first two pages, sheesh.