Google launches a new spam industry

by Michael Martinez on December 7, 2009

2010 SEMMY Nominee

I am not a big fan of Twitter. I have only recently reluctantly gotten involved with Twitter because it seemed necessary to create activity for several accounts that were set up to protect name spaces. For work I’ve been maintaining http://twitter.com/seo_theory. I’ve also been updating a couple of personal accounts.

Having little time to blog, I hardly have time to microblog. Although microblogging may take little time, I have no desire to tell people what I eat, where I am, or what I am doing. My personal time is my time. My work time is supposed to be dedicated to working.

So, I asked myself, I said, “Self — if you’re going to become a Twitt and Tweet, how can you do that usefully and productively?”

And my self answered (yes, that is a rare grammatically correct use of “my self” on the Web) by saying, “Autoposting”.

I don’t know who invented autoposting, but I will assume for grace’s sake he had a good heart and good intentions. Autoposting is, of course, the bane of Web search. You crank up your autoposting engine, add in a few feeds, and let it go. You don’t have to do any more work.

Many a made-for-advertising spam site has been operating this way. Twitts have been doing it for at least a year. I even noticed where a very well-known name in the search marketing industry recently beta tested yet another autoposting tool designed specifically for Twitter and its clones (of which I have read there are about 200).

For my autoposting I decided to include the feeds from SEO Theory and Best SEO Blog. But I also added feeds from a few other truly useful resources such as Bill Slawski’s “SEO By The Sea” blog (he mostly analyzes search patents), Barry Schwartz’s Search Engine Roundtable, and several search engine blogs.

I also decided to follow just a small number of search engine Twitter accounts and did not actively try to gather followers. Nonetheless, if I had to guess, I would say maybe 1/3 to 1/2 of my Twitter followers are autofollowers. Maybe more. I started out blocking some of the followers but soon realized I cannot be sure of who is real and who is just there to spam my Home page with blurby feeds from software.

The self-promoting schmucks at least spare us the details of what they (do not) have for lunch. Maybe they actually type that stuff in by hand. I don’t know. I have not followed many people because whom you follow is in a sense promoting that person (ack — it’s Google’s “links are votes” mantra all over again!).

Do people care about whom you follow? Probably not. But since I don’t have time to follow everyone who tweets or sneezes in my direction, I’m not going to randomly follow people in an effort to build up a huge list of followers. Ashton Kutcher, Oprah, CNN, and Google are welcome to have 1,000,000 followers if that makes them feel good.

I wanted the SEO Theory Twitter account to be productive and useful, not some shiny trophy on a virtual shelf. When I log in to Twitter I can see what the search engines (and a very small number of people who have interacted with me on Twitter) are tweeting about.

Search engine tweets are self-promotional too, but they’re not retweeting themselves endlessly. Self-promotional schmucks, unlike search engines, really have little to say, so they keep saying it over and over again. Ick.

New autoposting software offers you thousands of things to say on a random basis. Oboy. I can’t wait until the first 1,000 purchasers flood the real-time Web with useless quotes and retweets.

But let’s talk about the useless quotes and retweets. They will become very useful very shortly, I think. Google has just launched the Age of TwitterBotNet Spam. I mean, today Google announced its Real-Time Search is going live (following in the footsteps of Cuil, who launched real-time search in September).

When asked about PageRank and Real-Time Search, Googlers conceded that some PageRank-like data is being used to determine which real-time content is most authoritative.

Some people, no matter how wealthy they become, just don’t learn. PageRank is a joke because it has always been easy to game. That’s probably why Google launched mandatory personalized search last week. They can’t filter out the spam so they’ll let their users do it for them. Thanks for handing me the job, Google.

But I digress. Looking at the number of retweets and similar modes of citation, Google will attempt to decipher which real-time content is “authoritative”. So they have pretty much just handed a huge chunk of the real-time Web to autopilot marketers who can buy software to launch thousands of innocuous, natural language text snippets into social media sites that just happen to include links to the sites they want to promote.

In an afternoon you can set up a few dozen Twitter accounts. In a week you can set up hundreds.

You’ll soon be seeing artificially managed TweetNetworks that offer to put anything into real-time search results. The spam technologies have already been developed. The text databases have already been constructed. Once again, the search spam industry has anticipated where the search engines will go and laid the foundations for a new war of wills.

Amit Singhal boasted that Matt Cutts is already ahead of the spammers. Matt and his team are very smart, capable people. I am sure they filter out a huge amount of spam. But I am quite convinced — from having slogged through all the crap that gets past the spam filters — that they miss a huge amount of spam.

Google is now crawling about 1,000,000,000 pages per day. That’s an impressive number but if you do some quick math, you’ll realize that something is amiss. 1,000,000,000 don’t cut the mustard if you’re trying to keep the Web fresh. Clearly, Web Apartheid is still alive and well — and anyone who wonders why their Google cache dates are 2-3 months old shouldn’t need a degree in rocket science to figure out on which side of the fence their house sits.

Some sites (including many of my own) are well-crawled, revisited on a daily basis. I should not be complaining.

But some sites — lacking sufficient value-passing links — are not recrawled and recached very often. Google recently launched another round of massive link devaluations in its ongoing War on Links. The fact that people feel they need links in order to please Google hasn’t stopped Google from trying to stop people from obtaining links.

People are going to feel and act the same way about real-time search. It will be necessary to be “seen” as often as possible, and if you cannot generate the news you can at least buzz about the news and if you cannot be in 200 places at one time, no problem. Someone has invented software to help you show up in 200 places at one time, looking like a different person (thanks to all the DIGG spamming technology), saying different things (thanks to all the Twitter spamming technology), speaking at random (thanks to all the Blog spamming technology), and linking endlessly in a multitude of ways (thanks to all the link masking technology).

Is real-time search going to be worthwhile? Sure it will. We all want to know what is happening now. But today’s real-time search will quickly become spammed and polluted because in spite of the fact that it incorporates hundreds of factors to determine what is relevant (relevance is good as far as it goes), it will still use the flawed model of citation-based weighting to determine authority.

In mathematics and computer science, any proof (argument) fails at the first flaw. Google’s argument has at least one flaw: they continue to use PageRank-style weighting in all areas of search. People will continue to see very low-quality search results from Google precisely because it is so easy to game PageRank models.

As history has shown us repeatedly, Google isn’t forging new trailways — it’s walking down the clearly emblazoned paths created by Web marketers who could not wait for the search engines to organize new technologies. The spammers got there first and they will probably, as in the past, have a huge impact on the user search experience.

I don’t fault Google for having to be reactive to spam. But it would probably help them tremendously if they stopped putting “Kick Me” signs on their backs by making bold, unsubstantiated claims. PageRank was supposed to be the trick that would stop search engine spam — instead, it only accelerated the evolution and production of the spam technologies. “rel=’nofollow’” was supposed to stop blog comment spam. Instead, it drove the spammers to find blogs that don’t use nofollow and hammer them.

Real-time search isn’t going to stop spam, either. It’s just going to make spam more profitable. At least in the short-term. I’m sure the Web Quality team is already working on ways to deal with real-time spam, but their track record is, frankly, not very reassuring.

No one can stay ahead of the spam community. All we can hope to do is catch up quickly and beat down the garbage for a while.

So don’t expect me to spend a whole lot of time on Twitter. I don’t see that its value is going to improve much in the wake of real-time spam. I’ll keep the autoposting turned on because I think I’m aggregating good information that is beneficial to the SEO community. It’s information I myself feel is important.

I’m actually supporting Google’s citation-based philosophy by exercising editorial control. But there are two problems with citation-based weighting. First, time rewrites all scientific opinion. Real-time can be more forgiving, so Google probably is safe in that respect. But PageRank will always be gameable. They cannot stop the spammers from faking “authority”.

Editorially-selective guys like me are few and far between. I have nothing to hawk. I’m not going to flood the Twitter accounts I control with endless retweets of the same “Buy my schlocky system/software” posts. But I’m still autoposting.

When all else is said and done, at the end of the day the bottom line — in the mixed metaphor world of aphoristically-analyzed Web search — is that you have to wonder just how many of Google’s 1,000,000,000 pages-per-day are actually useful to anyone other than the people trying to make money off them.

If a blog says something useful on the Web, will any real people be there to read it, or are we all now just autoreading, too?

{ 4 comments… read them below or add one }

Dave 12.07.09 at 4:20 pm

Bravo brother… U said a mouthful. We’ve already had some fun spamming the stream to see what we could do (not something I do often, just when a new RTS service comes out) and found it was about as easy as Bing’s/OneRiot’s before them.

I know SEOs working at companies where they control some 3k twitter accounts for a variety of purposes, the hard core spammers will have many more… Once more, echoing your thoughts, the whole ‘vote’ approach (aka PageRank) is still flawed.

It’s early days for this one, but I have a suspicion it will do nothing more than flood more spam into the web, and possible cripple Twitter in the process….

We shall see…

Happy holidays to you and yours btw!

timware 12.07.09 at 9:57 pm

Whew, pretty cynical! My small company helps businesses use social media to build an interested group of fans/followers/subscribers, and we do it in an authentic non-spammy way. We steer them away from amassing followers/fans/subscribers just to build up numbers and, instead, encourage them to build their audience slowly and methodically, and primarily by generating compelling and useful content. And it’s working great.

I thought you presented some good info here, but I guess the tone of the whole thing (and the other commenter who simply agreed with you right down the line) was depressing. Just a lot of complaining while admitting that you’re part of what you’re complaining about.

You don’t seem to see good in anything. For example, I think the “nofollow” tag is an excellent counter to blog spammers and it’s definitely made my life easier. Sure the blogs that don’t use it will get hammered, and they’ll get hammered if they allow unmoderated commenting or don’t use Akismet, but then that’s their fault, right? I mean the outcome should be that everyone uses the nofollow or accepts the consequences of not doing so.

I would encourage you to tweet when you have something to tweet or don’t tweet at all, likewise with blogging, or any sort of communication in which you indulge. Autoposting just makes you part of the problem.

But, yeah, it’ll be interesting to see what happens with RT search, for sure. It does seem that Sidewiki has sort of faded away.

Cheers … Tim

Michael Martinez 12.08.09 at 12:56 am

Tim: “…Just a lot of complaining while admitting that you’re part of what you’re complaining about.”

Michael: You’re right, of course. I AM complaining about a problem of which I am a part. I also complain about global warming (which I believe we have accelerated despite the smear campaign being conducted against global warming science right now) — and yet I drive a car, I heat my home liberally, I burn logs in the fireplace, etc.

I have often said that the only difference between many types of Web spam and “ethical” SEO is excess. There is nothing wrong with visiting a random blog and leaving a comment with a link (in your name) to your own site, even if you tell people something about your site.

But when the SEO community does this out of habit, it becomes abusive. It’s no longer just visiting a random Web site — it’s “link building”, and that is excessive.

Tim: “I mean the outcome should be that everyone uses the nofollow or accepts the consequences of not doing so.”

Michael: In a fair debate where all the parties are equally informed, and the community has taken the collective decision to abide by a majority choice and the majority has chosen to follow the way of “rel=’nofollow’” there is moral justification for that point of view.

But in reality it was a minority point of view that prevailed, and that point of view was pitched on the basis of a false promise. Despite that flaw in the evolution of the Nofollow Web I have long since adopted the use of “rel=’nofollow’” in blog comment links.

And yet the SEO community could not restrain itself. Some people went chasing so-called “DoFollow” blogs and other people corrupted nofollow into a tool for chasing the Fools’ Hope of sculpting PageRank — wreaking such havoc that Google felt compelled to change the way it handles PageRank for documents with nofollowed links.

Again, excess spoiled what good we were able to achieve.

As far as autoposting goes I freely admit my guilt if guilt there is — I just wanted to be clear about what I was doing so no one could come back later and accuse me of a hypocrisy I had not committed. I’m more than happy to admit to the hypocrisies I enact.

But I’ve always opposed the other excesses that the SEO community has inflicted upon the Web. Much though I blame Google’s policies for many of the problems we face on the Web today, the SEO community shares a lot of responsibility for those problems, too.

So, yeah, I’m a bit jaded.

Also haven’t been feeling well for the past week or so. That usually sours my mood and makes my blog posts more colorful. Or so Rand Fishkin used to say. :)

fantomaster 12.08.09 at 2:06 pm

One ironic aspect about Twitter is that the concept of “spam” is actually getting “outsourced” in a way precisely because the likes of Google are trying to get ahold of a piece of the pie: unless you actually follow someone on Twitter, they can’t really pester you with unwanted content you at all, which kind of beats just about any popular definition of “spam” per se. Autofollowing tweeps is another matter, of course, but then you’re certainly not forced to do that nor is it implemented by Twitter as a default.

Yes, there’s plenty of bots around, including tons of spammers trying to pump traffic to their phishing or porn sites etc. But apart from hitting the utter newbs, I don’t really see them being too successful overall. The Net is just another environment that calls for its own precautionary measures, let’s not forget, just like jungle regions, deserts, city traffic, or movie queues…

Also, regarding auto posting: What on earth should be wrong with that if used properly? Ok, so it’s anything but interactive, but neither are RSS feeds etc. If you don’t like someone posting stuff non-manually (as if you’d always be able to tell the difference – trust me on this one: you aren’t!), simply don’t follow them anymore, simple as that.

Personally, I do a lot of scheduled auto posting to accomodate people’s varying timezones (and, yes, their attention spans to some extent as well). And much as I dislike that faddish “attention economy” meme, there’s no way around the fact that most tweets will simply go lost unread in the vast data stream that is Twitter.

Our traffic stats are quite telling on that score and will validate this take just about anytime: post a tweet once, it’ll generate X uniques, repeat it twice across say 16 to 20 hours, and what you’ll get is a 250-300% increase in traffic. (Expectably, making intelligent use of hashtags will boost this even further.)

And I sure agree with you, Michael, that tons of content is merely being hammered out to grease the linklove pumps. Stuff that would never ever be written/crafter or published otherwise. And yes, as SEOs most if not all of us are a major part of the problem, true. But then again it wasn’t we who set up the rules of this game, was it?

One particularly striking thing about Twitter is that every man and his dog seem to have a fairly rigid set of opinions (well, prejudices, maybe) of “what it is all supposed to be about”. The persistent call for “interactivity” is a good case in point. So is interactivity by definition always a good thing? Maybe, if you’re a lot into “conversations”. But I don’t really want e.g. a newsfeed or a weather forecast stream to be particularly interactive, do I? Many ways to skin a cat, and Twitter is certainly many different things to many different people. Which may arguably be a major element of its success.

But what actually makes “spam” of this all is Google etc. wanting in: that way, even the most inance or spammy tweet streams stand a humungous chance of getting a lot more attention outside of Twitter than within. And yes again, this is so easy to game, it’s pathetic. So they’re actually lending a platform of prime exposure to stuff that doesn’t ever really manage to make it on Twitter itself, heh.

Indeed, I couldn’t agree more with your blog posting’s title – it’s actually Google that’s giving Twitter spam the clout it’s so desperately vying for, not Twitter itself because it’s far more user-controlled than even the most ridiculously “optimized” Personalized Search on Google will ever be able to make it.