Unobtrusive Antispam: Using JavaScript and Cookies Efficiently To Thwart Spammers and Hackers

Spam sucks, period. But how do you stop spam effectively without putting your users through any kind of inconvenience (i.e. CAPTCHAs or lame questions)? In this piece, I’ll explore methods of using JavaScript, cookies and a server-side language together to prevent automated (i.e. non-human) spam via HTML forms and HTTP POST handling.

Hiding The Form With DOM Code

When spambots crawl the web, downloading thousands of pages per day, they are looking for common HTML forms and elements: website, comment text, email input, etc.

So the first step in keeping spambots from abusing your form is pretty simple: hide it.

Unless your site’s form absolutely needs to be accessible to text browsers (such as lynx) for blind accessibility, building your form using JavaScript DOM code will prevent bots from viewing the form input fields directly, thus eliminating simple automated injection attacks (we’ll discuss direct HTTP POST attacks later).

Placing a div element with an id attribute used by an external JavaScript function to place the actual form via DOM-modifying functions is the preferred method of doing so, with a noscript tag alerting users to enable JavaScript in case of NoScript or users with JavaScript disabled.

This prevents most if not all Internet-wide spambots from seeing your form and its fields.

Now, a problem arises when one wishes to preserve form data in case of a user error. If you must verify form data within the POST handler in your server-side script, but wish to preserve input so the user doesn’t get frustrated after inputting all 20 other perfectly-valid form entries, there are some hacks for doing this:

My favorite, which involves my favorite server-side scripting language PHP (but easily adaptable to ASP and friends), is to pass along the valid form data within the query string back to the page and script, or to store the valid form entries within session variables. Then, after renaming your external script to script.js.php and settings the header’s Content-Type to text/javascript, you can modify the form’s DOM elements to contain the value that was previously entered.

This will save the user frustration should they have to enter the same form several times over due to a faulty parameter.

Preventing HTTP POST-based attacks

Even if you use JavaScript DOM code to present the form to the user, your form’s server-side target may still be susceptible to HTTP POST attacks due to human-intervention within the spambot’s programming.

A malicious user may manually look at your script to determine the form input fields by hand, then simply write a program to send inject values via HTTP POST directly (bypassing the DOM-placed “smart form” altogether).

With this in mind, it’s a good idea to protect the server script from direct attack as well. And the preferred method of doing this is session management, which boils down to cookie-usage.

Whatever your scripting language uses to preserve session variables across server-side processing, it usually entails cookie usage on the client, with a sort of hash to obfuscate the values as well as for authentication (preventing one from manually forging the cookie as well as POST data).

So, using PHP once again as my server-side example, start a session on the form page and set a boolean server-side session variable. This will indicate that the user actually visited the form page in the first place. When later processing the form, before doing any intensive work (i.e. database code) verify that the session variable was set, indicating that the client actually used the form before sending POST data to the server script.

This in conjunction with the DOM-placed form will ensure that hand-crafted bots can’t simply send B.S. POST data directly to your script, as the client must have visited the form page for POST data to be processed.

As a precaution, however, be sure to manually set the session’s expiration time to something reasonable via either the PHP script or via the .ini file. Otherwise, a malicious user could simply retrieve a cookie at a specified interval and then use said cookie to send many instances of bogus POST data.

Form randomization

Now, we return to the form again: a malicious user could still download a unique cookie from your site before sending forged POST data, so now we make the form unique for every submission. For this, either another server-side session variable or server-side storage is needed.

Insert a hidden form input field within your JavaScript DOM code. Within the server-preprocessed JavaScript code, give the field either a unique name and/or value every time the script is sent to the client, preferably using a random integer as the value.

Store the value in either another session variable, or even better in a database using the session id as the reference for finding it later when validating the POST data. Other methods of storing the value for retrieval are possible, as long as you can associate the random value with the session id later to ensure the value was generated (evidence that the form was used by a browser).

Using cookies isn’t recommended since they can (again) be forged, which this solution is attempting to prevent. This provides further evidence that the form was used by a browser.

Spam Rule of Thumb

Remember: while this will block probably 99% of your form-solicited spam, it will not indefinitely stop all spam from occurring. A user whom wishes bad enough to attack your site could still manually download and parse a form via a premade script.

It is important to realize, however, that while only 99% effective (the 1% being enough to compromise a site) this will discourage and defend against most pre-made attacks, as well as tailored attempts at compromising your site.

Using all of these methods together will defeat all of your site’s spam as well as low-profile attacks. Continuing to block known IP ranges and other non-spam and security practices is still recommended in addition to these methods.

Your users won’t notice a thing, it keeps them from using CAPTCHAs (or those lame “what’s 1 + 2″ questions), and it allows them to do what they need to while you worry less about security.

And remember to always security-audit your website by hand every so often, protect against XSS and SQL-injection attacks, and to use secure (HTTPS/TLS) encryption for sensitive form data.

Bookmark and Share


About Anthony:



Anthony Cargile is the founder and former editor-in-chief of The Coffee Desk. He is currently employed by a private company as an e-commerce web designer, and has extensive experience in many programming languages, networking technologies and operating system theory and design. He currently develops for several open source projects in his free time from school and work.

Comments (3)

 

  1. nomalab says:

    Hi again, had to come back to this. Found something interesting: Honeypot CAPTCHA based on empty form field. Will need to test it but the concept is brilliantly simple.

    http://haacked.com/archive/2007/09/11/honeypot-captcha.aspx

  2. Anthony says:

    @nomalab that’s very true. One simplistic way of circumventing this is to discreetly block the default cURL useragent (which is modifiable via the command line) and/or establishing a set amount of time between the request of the form and the processing on the server, to catch and thwart automation in general.

    But nice catch, although the form randomization would thwart this unless a wget/parser combination is used in addition with the cURL method.

    Thanks for the comment!

  3. nomalab says:

    About the cookie method: it’s a no brainer to use cURL to get around this: first make a request to the form page to get the session ID in the cookie (CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR), then post past the form directly to the form handler using that cookie. So, requiring cookies will keep the script kiddies away, but won’t stop people who know what they are doing.

Leave a Reply

download comedy movies