Wednesday, June 03rd, 2009 | Author: AHSauge

This time I’m going to put up a really bold statement. I believe it’s currently possible to fight spam on the web without pictures, javascript or anything else that might compromise usability. Sounds quite impossible and too good to be true, right? Well, it might not be that far fetched as it seems. Please read on and I’ll explain why.

We’ve all seen it, captcha or pictures that protects alot of web-forms against those nasty spam-bots while at the same time blissfully destroying the usability of the web (10 “good” examples). While a lot of attempts have been made to make it more usable for more people (pictures aren’t exactly user-friendly for blind people …), it still boils down to some requirements either for the browser (e.g. javascript) or the user (typing into an input field). While requiring the browser to support something isn’t that bad, requiring the user to do something definitely isn’t usability at it’s best. Wouldn’t it be nice to just instant see whether or not the current request is made from a bot or a browser? Well, you might be able …

For the last two weeks I’ve created my personal pet project logging spam attempts in some scripts running at Ascended Development. After more than 300 attempts and above 100 unique spam-bots (or IPs anyway), the results shock me … a lot. The spam-bots that has visited the site is incredible stupid. In simplest terms they don’t even seem to be able to handle cookies (e.g. thereby also session in PHP), little less the pictures or javascript. The most ridiculous thing is that they actually do send data in the antispam input field. The problem is that it’s 2 to 4 times longer than the number of characters in the picture, and yes, it didn’t supply the cookie for the session meaning it wouldn’t succeed even if the wild guess was right.

So what can we do without requiring user input or javascript? Certainly not add a hidden input field in the form and hope that the bot add something to it. All bots (yeah, actually every single one of them) either provide the default value or didn’t provide the field at all. Actually out of the 300 attempts only 50 didn’t provide the field, and a massive 250 attempts had the default value. None tried any other values, and it seems that those that didn’t provide the field tried again with the default value. Simply put, a hidden field don’t work, at least not on those bots visiting our site at the logged locations.

So here’s the trick: Simply read the HTTP-headers. Has to be too good to be true? Well, the log is equally clear on this matter too. None of the bots provided the fields Accept-Language and Accept-Encoding, both of which quite frankly any decent browser sends out these days (Opera, Firefox, Konqueror, Chrome, Safari and even IE). Even lynx, a text browser, does send these headers, and I tested it with a two year old release. It does make sense if you think about it. The browsers will add the Accept-Language so pages can be correctly localized and Accept-Encoding so that compression can be used. Both things is benefitial to the user, and therefor present despite both being optional. The spambots on the other hand seem to be using libaries like libcurl to build their HTTP-client, and by default these libraries don’t seem add Accept-Language or Accept-Encoding. Add the fact that few, if any, sort spam-bots from browsers this way, and we can see that it’s not really such a surprise after all.

This isn’t without it’s flaws though. I’ve only encountered simple, general spam-bots and not the ones attacking widespread software like phpBB or vBulletin. Also, the site isn’t subject to targeted attacks. That been said, I wouldn’t be surprised if they too fail to provide these HTTP-headers, and for the time being I’m quite confident that this method is about as efficient, or better, than the current widespread method of using pictures. It should also continue to be that way until this type of checking is more widely used. So until then, I believe this is a good way to avoid spam in you web applications:

<?php
if (isset($_SERVER['HTTP_ACCEPT_ENCODING']) === false ||
    isset($_SERVER['HTTP_ACCEPT_LANGUAGE']) === false)
{
    echo 'Spambot';
}
else
{
    echo 'Browser';
}
?>

If only this was working against email spam too …

Share and Enjoy:
  • Print this article!
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
Category: Programming
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

10 Responses

  1. 1
    Navigator 
    Wednesday, 3. June 2009

    o_O! If this works you’re a spam-saviour (until the mot#erf#ckers discover the trick)

    Thanks to share it! :)

  2. 2
    Navigator 
    Wednesday, 3. June 2009

    Wait a second… (sorry to post again but I use to second thought all the things), what about crwaling bots like yahoo or google? Are they blocked too?

  3. Yeah, but as long as you only check it on submit it doesn’t matter. Crawlers don’t submit new content, they just index what’s already there.

  4. Show’s some promise, I’ll definitely have to try this out. Thx for the tip.

  5. very simple, and very useful.
    you are great

  6. This is a great idea but be aware of a few caveats. For one, there are some programs such as Norton Internet Security that “tinker” with the headers and may cause issues. Another is the use of translatiion sites such as translate.google.com which seem to blank certain header fields. Although the majority of visitors to a site will be okay, it never hurts to keep these type of issues in mind.

  7. Is this code placed in the header section of the page containing the email form? If not, where?

  8. It is a great idea and so simple! Thanks for sharing ;)

  9. I found 2 problems with your approach, even if I’m a KISS fan:

    1) it works (if used as the sole antispam method) only if very few people uses it and it doesn’t enter the spammers radar. If for instance wordpress.com starts using your method, in 2 hours spammer would get their botnet updated and your blog flooded
    2) it doesn’t protect against pin-pointed attacks. Captchas are not used only in in blogs but a in a wider range of services. If yyou’re running a site where captchas helps you against frauds, your method will be broken again in a few hours. Ok, in this last case you should implement an anti-bruteforce method in the backend as well and do not rely only on captchas.

  1. [...] How to stop spam without pictures or javascript | var/log [...]

Leave a Reply