Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Keeping comment spam off your site and away from users

Friday, September 26, 2008 at 2:26 PM

So, you've set up a forum on your site for the first time, or enabled comments on your blog. You carefully craft a post or two, click the submit button, and wait with bated breath for comments to come in.

And they do come in. Perhaps you get a friendly note from a fellow blogger, a pressing update from an MMORPG guild member, or a reminder from your Aunt Millie about dinner on Thursday. But then you get something else. Something... disturbing. Offers for deals that are too good to be true, bizarre logorrhean gibberish, and explicit images you certainly don't want Aunt Millie to see. You are now buried in a deluge of dreaded comment spam.

Comment spam is bad stuff all around. It's bad for you, because it adds to your workload. It's bad for your users, who want to find information on your site and certainly aren't interested in dodgy links and unrelated content. It's bad for the web as a whole, since it discourages people from opening up their sites for user-contributed content and joining conversations on existing forums.

So what can you, as a webmaster, do about it? 

A quick disclaimer: the list below is a good start, but not exhaustive. There are so many different blog, forum, and bulletin board systems out there that we can't possibly provide detailed instructions for each, so the points below are general enough to make sense on most systems.

Make sure your commenters are real people
  • Add a CAPTCHA. CAPTCHAs require users to read a bit of obfuscated text and type it back in to prove they're a human being and not an automated script. If your blog or forum system doesn't have CAPTCHAs built in you may be able to find a plugin like Recaptcha, a project which also helps digitize old books. CAPTCHAs are not foolproof but they make life a little more difficult for spammers. You can read more about the many different types of CAPTCHAS, but keep in mind that just adding a simple one can be fairly effective.

  • Block suspicious behavior. Many forums allow you to set time limits between posts, and you can often find plugins to look for excessive traffic from individual IP addresses or proxies and other activity more common to bots than human beings.

Use automatic filtering systems
  • Block obviously inappropriate comments by adding words to a blacklist. Spammers obfuscate words in their comments so this isn't a very scalable solution, but it can keep blatant spam at bay.

  • Use built-in features or plugins that delete or mark comments as spam for you. Spammers use automated methods to besmirch your site, so why not use an automated system to defend yourself?  Comprehensive systems like Akismet, which has plugins for many blogs and forum systems and TypePad AntiSpam, which is open-source and compatible with Akismet, are easy to install and do most of the work for you. 

  • Try using Bayesian filtering options, if available. Training the system to recognize spam may require some effort on your part, but this technique has been used successfully to fight email spam

Make your settings a bit stricter
  • Nofollow untrusted links. Many systems have a setting to add a rel="nofollow" attribute to the links in comments, or do so by default. This may discourage some types of spam, but it's definitely not the only measure you should take.

  • Consider requiring users to create accounts before they can post a comment. This adds steps to the user experience and may discourage some casual visitors from posting comments, but may keep the signal-to-noise ratio higher as well.

  • Change your settings so that comments need to be approved before they show up on your site. This is a great tactic if you want to hold comments to a high standard, don't expect a lot of comments, or have a small, personal site. You may be able to allow employees or trusted users to approve posts themselves, spreading the workload. 

  • Think about disabling some types of comments. For example, you may want to disable comments on very old posts that are unlikely to get legitimate comments. On blogs you can often disable trackbacks and pingbacks, which are very cool features but can be major avenues for automated spam.

Keep your site up-to-date
  • Take the time to keep your software up-to-date and pay special attention to important security updates. Some spammers take advantage of security holes in older versions of blogs, bulletin boards, and other content management systems. Check the Quick Security Checklist for additional measures.

You may need to strike a balance on which tactics you choose to implement depending on your blog or bulletin board software, your user base, and your level of experience. Opening up a site for comments without any protection is a big risk, whether you have a small personal blog or a huge site with thousands of users. Also, if your forum has been completely filled with thousands of spam posts and doesn't even show up in Google searches, you may want to submit a reconsideration request after you clear out the bad content and take measures to prevent further spam.

As a long-time blogger and web developer myself, I can tell you that a little time spent setting up measures like these up front can save you a ton of time and effort later. I'm new to the Webmaster Central team, originally from Cleveland. I'm very excited to help fellow webmasters, and have a passion for usability and search quality (I've even done a bit of academic research on the topic). Please share your tips on preventing comment and forum spam in the comments below, and as always you're welcome to ask questions in our discussion group.

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

18 comments:

Chris said...

Interesting article, with some very useful suggestions. I've got a question for the author though. You say you're from Cleveland, is that Cleveland Ohio? If so, I think it's the first time I've seen an American use the word "dodgy". It's been in common use in the UK for decades but British slang doesn't travel across the Atlantic as quickly as US slang in the opposite direction. Is dodgy becoming mainstream in the US? PS This might not be the most relevant post but I promise it's not "comment spam"!

kkll2 said...

Spammers generally don't obfuscate keywords in comment spam, because they want these keyword to be readable by Google.

There's open-source Akismet-like filter:
http://code.google.com/p/sblam/

Noodle said...

I've been using a JavaScript technique to stop comment spam, with an amazingly high success rate (I've had one spam get through since January).

I've posted an entry about it here:
http://neilang.com/entries/a-better-technique-to-stop-spam/

Gareth said...

Javascript methods have the disadvantage of potentially preventing non-javascript users (like those running NoScript) from being able to post. OK, so you might only have one spam get through, but how many did you block and how many legitimate comments got stopped too?

We monitor all the comments on our site with no captcha and no javascript validation, and the most successful prevention by far is a honeypot trap.

It's simply a field with a common name (e.g. 'comments') whose only purpose is for spambots to fill out. It's labelled with "Do not fill out this field" and hidden with CSS, but to the spambot it's just another junk receptacle. Naturally any form submission with anything in this field is put aside to be manually checked.

Potentially there are a couple of issues with this approach:

1) Any 'autofill' browser functionality could trigger this trap, however we've found that autofill rarely applies to textareas and not to 'comments' fields.

2) It's trivial for anyone specifically targetting your site to circumvent this measure. At that point you have to start thinking about CAPTCHAs and the like, but I can't imagine most sites on the internet are open to that level of attack.

However, we've had zero false positives in our case, and provably no legitimate comments have been blocked

Noodle said...

@gareth The honeypot and javascript methods are very similar. Both methods rely on the browser rendering the form differently for spambots then real users. I can see benefits and issues with both methods.

With my implementation the user can be notified that they need javascript enabled before their submission, as well as after a failed submission. If you are concerned with legitimate comments being lost, like in your method, you can log the failed submissions for manual checking later.

Another benefit from the javascript method is that it can be applied to any form on your site without having to alter the server-side form processing script(s).

The honeypot method relies on the spambot to submit a value for the obscured field. If spambots begin to randomly vary their submission values to specifically target honeypots, they could succeed in posting.

I don't think either method is 100% foolproof. An issue with the javascript method is that it is only effective while spambots don't process embedded javascript.

Perhaps for best security without using captchas, you could use a combination of both methods.

Kipper said...

I hate form spam. Something i've tried and am having success with at the moment is an ajax style form, whereby a user fills in as normal, but rather than redirecting the whole page (and writing to the database inbetween), the submit button just triggers a background call to do the same work - and in doing so, also adds in another secret variable thats not available from within the form itself (like isPosted = 1) just to be sure that nobody is calling my script directly. It could be circumvented, but you'd need to be human to do that, or a very clever bot that can read and understand code!

Obviously this approach is going to deny those not using javascript, but I dont think its non-use is as widespread today as it used to be.

Yoli said...

Interesting article and Happy 10th anniversary at everyone at Google.

Megan said...

Another thing to keep an eye out for is posts that look legit but seem a little bit off on further examination. Spammers are sometimes adept at making their comments look legitimate.

For example, you might see comments like this:

"Hi, great post, thanks"

or

"Thanks for the great info"

Or similar, general comments that could apply to any blog post. These are often spam.

One thing I've noticed on my forum lately is people copying and pasting content from other sites and posting it as their own. If a post seems suspicious try doing a google search for the text that was posted (in quotes).

I think the automated systems like Akisment are often the best solution for a blog. They work well and they don't inconvenience your regular users.

Geert said...

Use Mollom to filter the content of the comment post. Mollom responds 'ham' or 'spam' and depending on their response you either allow or disallow the content on your website. There also is an in-between option that allows the webmaster to moderate. Plugins for Drupal, Wordpress, PHP, Ruby, Java, .NET ...
Cool!!

Borneoguy.com said...

CAPTCHA is still the best method for me, but that won't stop manual spammer I reckon, only bots ...

Joe Le Mort said...

You can also add a hidden field in your form and check if it's filled, it's a bot ;) . Works at 99% !!
the trick :
http://www.alexandreval.info/cv/ne-plus-etre-spamme-sur-son-site-blog/

The said...

Is anyone gonna update us on the use of the word dodgy in the U.S? You'd think at least one American could reply to this question.

Rich Motivation said...

Yes, I've heard American guys use "dodgy," particularly guys from the Northeastern part of the US.

Mary

Chaoley said...

Here's an ingenious solution that uses a points system to filter out spam submitted via comment forms. Note that the author also turns off comments on a post after a certain amount of time.

http://snook.ca/archives/other/effective_blog_comment_spam_blocker/

David Burns said...

In the past I have heard 'dodgy' quite a bit. I guess it's not as 'in' as it used to be. Now it seems it may be an 'in' slang for your country.

ITSL Technologies said...

I am little confused to read your post. this suggestion is for URL redirecting problems .if yes then i have site whose design or look and feel will changed after few days , can you help me that i will not get any type of problem of like this , or URL redirecting will effect on my site

Bloggercito said...

Come one, Google is the first responsable on all spamm comments, because it pushes people to get backlinks.
I know you can give me a 4 hours explanation and theory, but maybe you should use that 4 hours to open your eyes and see the REAL world.
Ask yourself why is people using a lot of time trying to get backlinks to their site.
Google makes internet a links trading.
Must links over the web are not real and organic ones.
Even the big sites now uses nofollow EVEN for REAL AND ORGANIC links.
As i said, open your eyes and look the real thing.

Richard said...

It still seems to me, at least in niches like mine, that you can get organic backlinks with a phone call.

My niche: technology and churches