How to fight Web spam on your own turf

by Michael Martinez on January 14, 2009

Matt Cutts took the War on Webspam to a couple of conferences and subsequently put together one of the most informative and helpful videos I have seen in a long time, from a Webmastering perspective. He goes into some detail about how spammers can exploit your Web sites but doesn’t really cover enough turf, in my opinion.

Now, normally I leave it to the search engines to fight it out with the Web spammers for control of the search results because, frankly, it takes too much time to sort through all the moral rights and wrongs. Google is hardly a saintly service. It’s just a very popular search engine.

Web spam, on the other hand, isn’t paying me enough to keep quiet. Okay, I’m not on the take at all, but there isn’t a whole of lot of incentive for me to take on the Web spam community, some of whom can be rather vicious psychotics, in my opinion. If you get their attention, they can do some mean things to you.

Well, I may not be mean but at least one Web spam team now has my attention. They’ve been “outed” before and they’ll probably be outed again. But today it’s my turn to draw attention to their nasty little registration spam floods. I run a forum, SF-Fandom. For a while I was pretty much running the Spider-Food forums last year, too.

Defending two Web forums from Web spam intrusions is a full-time job — more than a full-time job. Matt Cutts did not exaggerate the value of pre-emptively making it challenging for Web spammers to infect your Web sites with their trash. The script-kiddies, wannabe spammers who buy outdated software, and other second-rate “black hats” have nothing on the real professionals in the industry.

You can block Web spammers in a variety of ways, but you cannot keep the oozing slimeballs completely out of your system. They chew through domain names and IP addresses like there is no tomorrow, burning many ISPs and future trademark owners in the process. If you think there is a real-time blacklist out there that will help you defeat Web spammers, think again.

They use widely distributed networks to run their scripts against a growing list of targeted Web forums, blogs, and social media sites. They’ll engage proxy services, buy up domain names in bulk (and there are registrars that make it easy and less expensive to buy domain names in bulk), and generally do whatever they can to hide their tracks.

One of the most annoying of these little players is InternetServiceTeam (as in internetserviceteam.com). Some of you have run into these guys already — I know because I’ve seen your discussions about them. If IST has a fan club, it doesn’t appear to be active in the open Web mastering and promotion community. Based on the stunts I’ve seen them pull I get the feeling they sell their services to the highest bidder, but maybe it’s all one operation just pulling for themselves. I don’t know.

InternetServiceTeam uses IP addresses based on servers all over the world. You’ll find some of their IP addresses in this list:

24.175.157.23
58.62.198.23
61.19.252.237
66.128.38.41
66.159.18.9
66.197.134.103
75.37.214.224
78.110.173.252
78.159.112.189
78.159.112.191
82.208.46.25
83.137.228.66
83.222.23.242
87.118.94.222
91.103.24.13
91.151.119.76
91.184.39.147
94.76.213.205
98.197.249.24
98.208.46.176
122.240.90.253
195.80.2.7
199.195.109.19
200.93.16.34
203.110.240.22
206.174.231.210
212.193.239.250
212.227.114.93
212.227.114.95
212.235.92.142
212.235.92.143
212.235.92.172
212.235.92.173
218.0.205.225

Seem like a big list? Hey, I’ve got plenty more where those came from. Some of those IP addresses may actually be used by other spammers, but a fair chunk of them came from InternetServiceTeam.

Web spammers like to register with your Web sites. They’ll do this for a variety of reasons, but a lot of them are looking for profile pages. If you let just anyone create profile pages you’re asking for a lot of dirty company. These profile pages may be used to house links to other sites, or they may be used to send people directly to other sites. Matt explained the process in his video.

What he didn’t explain was just how much effort you have to go to in the moderation process to clean up these messes. You have to delete the accounts, you have to block the registration domains, and you have to block the IP addresses. Now, depending on your server and application configuration you may have to implement blocks at different levels.

If your Web server will honor something like a hosts.deny file, you should keep it up to date with spam domains and IP addresses. Any time something comes from one of those domains, the server will check the hosts.deny and block the signal. Of course, not everyone can block domains and IP addresses at the server level.

If you’re running a Web forum its administrative panel may allow you to block domains. At the present time, InternetServiceTeam is registering a large number of Chinese domain names to use for autogenerating email addresses. With these automated email addresses they set up bogus accounts on your system. Not all blog systems let you do that, but Wordpress and Blog2Evolution (as well as others, I am sure) let you report Web spam registrations to a central blacklisting service.

Here is a list of actions you CAN take if you want to combat Web spam on your own site (forums and/or blogs):

  1. Turn off registrations
  2. Require email confirmations for registrations
  3. Disable member profile pages
  4. Restrict member profile pages to confirmed users or users who have been approved or posted a minimum amount
  5. Create special user groups where you set privileges and move people into those groups based on their behaviors
  6. Arbitrarily block all domains outside the native English-speaking countries (especially Russian, Polish, Ukrainian, Indian, and Chinese top-level domains)
  7. Record the IP addresses and block them in .htaccess access, hosts.deny, the application admin screen, or all three
  8. Record the domain names and block them in the .htaccess access, hosts.deny, the application admin screen, or all three
  9. Ask well-behaved robots like Slurp, Google, and MSNBot to NOT crawl member profile pages
  10. Disable HTML code and Javascript in all posts
  11. Disable HTML code and Javascript in all signatures
  12. Disable full feed in your RSS feeds

There are a few other tools in the shed but I’m going to keep them secret for now. There are limitations or tradeoffs for each of these options.

Turn off registrations – To prevent random script kiddies from abusing our blogs and forums most savvy people now require free registration in order to post. However, if you’re just sharing your thoughts with the world you don’t have to allow people to post. Although conventional wisdom says the more you engage with your readers in the comments of your blog, the more readers you’ll get, if you make your blog posts informative enough people WILL link to them and read them.

Turning off registrations in a forum is the kiss of death unless you set up an alternative system. My admins do occasionally create accounts by hand when people have trouble registering with our automated system. So it’s possible to set up an alternative system, but may not be easy or worthwhile.

Require email confirmations for registrations – While this does not actually stop spammers from autoregistering in bulk, it stops many of them dead in their tracks. Some scripts will register without email confirmation and then post 3 or 4 random messages in various fora. The messages look like real people posted them, but they always appear in suspicious patterns. Most scripts cannot yet confirm emails (in fact, the domains they use tend to be throwaway domain names, so the spammers may not even have email services turned on).

Disable member profile pages – You know, I’m not sure how many people actually look at these things, but our member profile pages do still get a fair amount of traffic. I think most of it is now real traffic, but I cannot be absolutely sure. There are just too many IP addresses to check. So I am reluctant to take this privilege away from my community. On the other hand, if you’re just starting out, you may want to give it some thought.

The newest version of VBulletin (and perhaps other forum software) now allows you to give your members free blogs and image galleries. Think through how you want to manage those privileges before just turning them on and forgetting about them.

Restrict member profile pages … – That is, don’t let new members have profile pages until they cross some arbitrary threshold. For example, in VBulletin if you require email confirmation an account’s profile page will not be turned on. All the Web spammer gets for his trouble is banned and he adds to an inflated membership count. SF-Fandom has to date deleted around 24,000 spam accounts (give or take).

Create special user groups … – People have complained about Google’s GMail being used for spam registrations. Googlers have publicly stated they delete these accounts as they are identified but many of them are created by hand by people in India and Pakistan (or other nations in that region). You can block profile page and signature privileges for all registered users and only extend them to users who are included in custom groups. If you base the privileged group access to a minimum number of posts, go with no fewer than 10 (maybe 25 would be better). You should be able to do bulk membership edits by group.

Arbitrarily block all domains outside the native English-speaking countries – Regrettably, I’m unable to implement this option easily as many of our real members come from over 100 countries around the globe. But if you’re just setting up a forum and you don’t plan to work with people outside your language group, just restrict registration to a manageable number of top-level domains. You won’t regret that decision.

Block IP addresses and registration domains – Gmail.com and several variations (which may or may not be owned by Google) are blocked from registering at SF-Fandom. These blocks have certainly inconvenienced a few people but the blocks will remain in place because they stopped a flood of spam registrations. Hotmail is also blocked, along with many other domains and IP addresses. And, no, I won’t share all the domains and IP addresses I’ve blocked.

Block good robots from indexing profile pages and signatures – Why do this? Because small-time spammers (they call themselves “link builders”) get past your algorithmic filters and still create their idiotic drop-link pages. You can at least waste these idiot wannabe SEOs’ time by asking the search engines to NOT index your profile pages and post signatures. Those of you who love PageRank sculpting will appreciate just how much value you don’t pass to other sites. Of course, you won’t be too popular with amateur SEOs who practice low-budget search reputation management by spamming their name searches with empty profile pages.

Disable HTML and Javascript – Yes, Matt covered this in his video. But a lot of forums allow this functionality. Some blogs, too. It’s scary. VBulletin and many of its competitors allow you to create custom BB Codes where you can control the HTML and Javascript. It’s a good idea to test out any custom BB codes very, very carefully. VBulletin users share ideas, warnings, and experiences in a couple of forums so as to help minimize the risk of exploitation.

Disable full RSS feeds – This has to be the singularly most unpopular decision I ever made concerning the SEO Theory blog. That still did not stop 1800+ people from subscribing to our RSS feeds. If someone is too lazy to visit the blog, they don’t need to read it anyway. That’s just my opinion.

And there are other ways to combat the Web spammers. To be honest, you would never have seen this post if InternetServiceTeam had just left me alone or gone away the first time I blocked their sorry spam.

All I ask is that the Web spam community leave me alone.

{ 3 comments… read them below or add one }

vbplusme 05.11.09 at 4:59 am

Interesting, I got to your site on a search for “bbcode”… I run three vbulletin forum sites and do work for dozens more. The SPAM problem has gotten progressively worse over the last five years but really became impossible to manage just over 2 years ago.

I find it laughable that Matt Cutts would have an issue with it given that Google is the number one primary motivator and driving force for ALL the Forum Spam on the internet. Its ALL about Google and Links without the Google incentives Forum Spam did not exist. Forum Spammers are into Links because that is what Google requires to get good search engine ratings.

I discovered one of my site was / is getting inundated with Thousands of comment spams per day, 100 posts with at least 100 links in each post, an amazing feat that could only be accomplished with spambots.

NONE of this existed before Google and their bogus Page Rank was introduced a few years ago. Took the spammer badguys some time to “get it” but now that they do, millions of pieces of cyberjunk are floating around looking for a place to pollute.

Forum Spammers, at least the thousands that I am seeing don’t really care about “posting” spam to forum threads now, the post a few links in their profiles but register thousands of profiles every day. It allows them to completely slip under the wire and still get the benefit of a huge number of backlinks that Google rewards them for in terms of huge PR ratings that the badboys sell links to and exploit in more ways that anyone can imagine. The use customized tools like XRumer to automate the whole process. Just last year XRumer took the whole Gmail system for a ride with generating thousands of email addresses that are being used even today to register bogus accounts on unsuspecting Forum sites.

The bottom line here is that Google made the mess, Google needs to FIX it. It is the bogus PR system that is at the heart of the matter and that is what needs to go. Maybe Google needs to go… they make Micro$oft look like choirboys on their best behavior…

vbplusme 05.11.09 at 5:03 am

BTW, I stopped adding IP addresses to my .htaccess files due to the fact that adding thousands of lookups per request is not a practical way to solve the problem. I wrote a script that does a lookup on my registration forms that checks for known spammer IP addresses and filters for certain footprints of spammers. I have caught more than 5000 spammers since the first of the year.

Michael Martinez 05.13.09 at 8:51 am

Brother, I feel your pain and applaud your successes!