Fighting the Comment Spam

As some frequent visitors may have noticed over the last week I've been dealing with a pretty bad spam problem on this blog. Since I don't use Wordpress or any other blogging framework there wasn't a quick fix for this issue, no plug-in to download and plop on or anything. I had to come up with something from scratch.

Over the years I've leaned on the 'honeypot' system. There are four main input fields to add a comment (name, email, website, and comment), three of which are required. There is an extra field, a hidden input, which users don't see. Automatic scripts will attempt to add a value to all possible fields, even the hidden ones, so I just check that hidden one. If there's a value there I ignore the comment. Human users can add comments just fine, bots are blocked. Simple, (mostly) effective, and it doesn't require a legitimate user to perform any extra steps (like, captchas).

Honeypot worked great for a long time. Every once in a while a bad comment would soak through, usually human users looking to build up backlinks for SEO purposes, but the bots were held at bay. That changed a week ago. The blog was hit from a number of unique IPs sending various combinations of the comment fields and many got through. They figured out how to get past honeypot.

To give you an idea of the issue… In the last 8 days along my blog has been hit with 689 spammy comment requests. Only 63 spam comments got through the honeypot. So, honeypot was blocking 91% of the spam, which isn't horrible, it just isn't good enough.

How to get past the issue? I had a few options.

Disable Comments

I could just disable comments altogether. Many blogs I follow have started to do this, worried about both spammy comments and unhelpful discussion threads. I didn't want to do this. The comment threads on this site are pretty awesome, greatly adding to the content on here, and I didn't want to block it. So that was out.

Require Extra Information

This was something I've played around in the past with - forcing users to create authenticated accounts (which could be Facebook) before posting information. Raising the bar also means making it more difficult to contribute, though, and I ruled this out too. Whether it was forcing user account creation or adding extra fields (like captcha) I didn't want to make contributors work harder.

Trust Repeat Commenters

What if I trusted repeat commenters? Like, if you've commented before your good to go. If you're a new commenter than I need to approve your comment first. No additional steps on the user's part and a simple approval process for me to go through. This could hold some potential.

One thing that helped the decision was that I had always planned to help repeat commenters. If a commenter leaves multiple comments within the span of several hours or weeks there's no real need to force them to re-add their name and email. Basically, after someone adds a comment they are creating a relationship with the site, a relationship that is verified by their name/email combination. Any future comments would automatically fill with this information to help solidify that relationship.

So what if someone new comes along? I can't just trust them, can't just display their comment on the live site. They could be a spammer, adding more spammy links and spammy text and spamming up the discussion.

When a new relationship is created, a new name/email combination, that relationship is still there. So, after they submit their comment, I could show their comment only to them with a pending flag. Then they know that they aren't ignored, that their comment will show up as long as it's legit. Notifications won't get sent out until the relationship is verified. Once they are verified, though, it's open season. They're part of the cool kids club.

To lay things out a different way…

  1. USER goes to add a new comment

  2. if (USER has added a comment before)

  3. autofill comment form with USER data

  4. USER submits a comment (and all required fields check out)

  5. USER relationship is created with site

  6. if (USER is a trusted commenter)

  7. comment is added and displayed for all to see

  8. notifications are sent out to appropriate people

  9. else

  10. comment is added but only displayed to USER

  11. no notifications are sent out

  12. JACOB needs to go and approve comment before public

  13. ???

  14. PROFIT :)

This fix isn't complete yet. I still need to add a better approval process. Ideally the new commenter would be notified when/if their comment is approved so that they can fill warm and fuzzy inside. Also, notifications should be sent out when new comments are approved to previous commenters. Future enhancements.

Oh, and I didn't remove honeypot. It's still blocking a huge majority of the spam. Less comments for me to go through and reject.

So what does this mean for commenters? Some people may see a delay in their comment showing up. Otherwise, nothing. Trusted commenters, which account for a huge majority of the comments on the blog, will see no change outside of less spam. Not too shabby of a fix at all.