How to Stop Spam Harvesting With Email Obfuscation

Tweet

Email harvest timeThe day I discovered the “mailto:” link was glorious. I could publish my address on a web page and anyone could email me with a single click. This was in the more innocent days of the web – before the spam harvesters took over. Use a “mailto:” today and your first viagra message will appear 30 seconds later. So how can you publish an email address without attracting unwanted attention from spammers?

The most obvious solution is to use a machine-unreadable email in your HTML, e.g. “bob (at) bobsdomain dot com”. Whilst this makes it difficult for spammers, it also makes it difficult for your users.

Another option is to generate the email address using JavaScript, perhaps with a little string concatenation or encoding e.g.


<p>contact 
<script type="text/javascript">
document.write('<a href="mai'+"lto"+"bob"+'@'+'bobsdomain.com">bob@'+"bobsdomain.com</a>");
</script>
</p>

This will stop most spammers, but anyone with JavaScript disabled will not see your address. (I would not recommend using document.write either.)

A better solution is to use a combination of techniques to thwart spammers without causing user difficulties. The first step is to use a human-readable but harvester-proof email address in our HTML. We will also make this a link to a contact page, e.g.


<p>Contact <a href="contact.html" class="email">bob (at) bobsdomain dot com</a></p>

Note that we have included a class of “email” so our link can be identified. The next step is to write a JavaScript function which searches your page for obfuscated emails and transforms them into real “mailto:” links. We will create a ‘email.js’ file and include it in our HTML:


<script type="text/javascript" src="email.js"></script>

The required code is short, so we do not need a JavaScript library:

Content of email.js:


function EmailUnobsfuscate() {
	
	// find all links in HTML
	var link = document.getElementsByTagName && document.getElementsByTagName("a");
	var email, e;
	
	// examine all links
	for (e = 0; link && e < link.length; e++) {
	
		// does the link have use a class named "email"
		if ((" "+link[e].className+" ").indexOf(" email ") >= 0) {
		
			// get the obfuscated email address
			email = link[e].firstChild.nodeValue.toLowerCase() || "";
			
			// transform into real email address
			email = email.replace(/dot/ig, ".");
			email = email.replace(/(at)/ig, "@");
			email = email.replace(/s/g, "");
			
			// is email valid?
			if (/^[^@]+@[a-z0-9]+([_.-]{0,1}[a-z0-9]+)*([.]{1}[a-z0-9]+)+$/.test(email)) {
			
				// change into a real mailto link
				link[e].href = "mailto:" + email;
				link[e].firstChild.nodeValue = email;
		
			}
		}
	}
}

An explanation of the code:

  1. Line 4 fetches every <a> link in our HTML page and line 8 loops through them.
  2. Line 11 checks the link for a class of “email”.
  3. Line 14 grabs the obfuscated email from the text content of the node.
  4. Lines 17 to 19 transform it to a real email address using regular expressions: “dot” is changed to a “.”, “(at)” is changed to “@”, and all spaces are removed.
  5. Line 22 checks the resulting email address is valid.
  6. Lines 25 and 26 then modify the DOM node and make it into a real “mailto:” link.

Finally, we need to ensure the function runs on page load by adding a line to the bottom of email.js:


window.onload = EmailUnobsfuscate;

The result:

  • Our original HTML page contains no “mailto:” links and cannot be easily harvested by spammers.
  • The majority of users (those with JavaScript enabled) will see a standard email address and “mailto:” link.
  • Anyone not running JavaScript will see the readable “bob (at) bobsdomain dot com” address.

This intention of this article is to show the concept rather than real code. Although the example works, I suggest you:

  • Use your own obfuscated email format, e.g. “bob {@} bobsdomain -dot- com”. Spammers can read this article and transform encoded emails just as easily as you!
  • Use a different link identifier class – “email” is a little obvious!
  • Use a JavaScript library, such as jQuery, to make the function shorter. You should also ensure it copes with whitespace or other DOM nodes around the email address text (not handled in the code above).
  • Replace the window.onload with a more robust event handler.

Best of luck.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://thenetgen.com agentforte

    Re-captcha allows you to protect your email with a security image.

    A contact form that sends to an address is also a good idea. You can put a filter so that the email only accepts emails from the contact form. Even though there are spam “bots” that fill forms, this is less common than regular email spam.

  • sitehatchery

    Good one

  • sitehatchery

    I thought this was a very nice way to display your email address on the website rather than hiding the email address with the use of a contact form.

    One way to use a contact form without a CAPTCHA is to check the referring URL on the processing page. If the referring URL is null or another domain, then there’s reason to believe that someone is using cURL, phishing, etc and you can throw an error.

    You could also generate some random string and assign it to a session variable and to a form variable. When you process the form variable, match it to the session variable. If it’s a match, then you’re good… well, unless you’re running 32 bit Firefox (for the plugins) on a 64 bit machine. In such case, it won’t work, because Firefox (on 64 bit machine) will assign a second session variable after it assigns the form variable and it screws everything up… and the same happens with CAPTCHAs too! Which means that if there’s a CAPTCHA, I have to switch to IE or Firefox for 64 bit. So annoying.

  • http://www.openedgewebdesign.com Timbothecat

    Great article, I’ll use this later in the year for web sites that I have to build as part of my course.

    The form on my website gets spammed about 2-3 times per week. Either way you do it though (php form or JavaScript), it certainly cuts down on the amount of spam you receive.

  • heggaton

    This is a nice one. I think I’ll keep it on file for future reference.

    Previously, I’ve used a custom JavaScript obfuscator and directed people to my feedback form in the noscript tags with a message like “If you wish to contact us, please use our feedback form” <- linking to the feedback form page.

  • glenngould

    This is a good solution. I like it, thanks.

  • http://www.visual28.com visual28

    By using (at) and dot in the email, the only thing you have done was make it a tiny bit harder for the email harvesters to get at your email. If I was a harvesting emails off the web, I would look for this same syntax and them run the emails against my own de-obfuscation scripts to correct them. Sure you take a small dent out of the really (really) lazy programmers but not likely.

    In an article I wrote last year (sorry for the shameless plug) I discuss this same approach and how hacker sites refer to this as “Obfuscation for humans but not for robots.” It just doesn’t work.

    @agentforte, the problem with using images for email, is that you make it difficult to make your site accessible to unsighted users. If it’s your personal site that is one thing, but obviously a business stands to loose a lot more by adopting such practices.

    I personally like @heggaton’s approach in using the noscript tag to link to a contact page. At least this way the contact method is not removed completely. I love it so much, I wish I thought of that before hand. Great Suggestion @heggaton.

    So unless I am missing something from the article that was vitally important, I would suggest avoiding this method as it does not appear to be a good solution.

  • tekkie

    For Macheads there’s a Dashboard widget that can be used for email obfuscation: obfuscatr.

  • http://www.patricksamphire.com/ PatrickSamphire

    heggaton’s approach is a good one, although you might do better avoiding the noscript tag and using javascript to replace the link to the contact form with obfuscated email. (Just seems neater, somehow.)

  • http://www.cemerson.co.uk Stormrider

    I’m pretty sure all these attempts at obfuscation are useless – spam bots are programmed to recognise all sorts of patterns, and can probably decode the javascript ones as well – email addresses are worth a lot of money to spammers, so they are likely to put a lot of effort into cracking these techniques.

    Unfortunately, hiding behind a contact form is the only way to keep the email address safe, and I don’t really see a problem with using them.

  • Stevie D

    The method I always used to use was to replace the @ with @ – the character reference for @ – which the browser would decode correctly but which harvesters didn’t generally pick up. Does anyone know if that method is still reasonably secure?

    But I do now use a contact form instead – it’s good practice, although it does have its drawbacks.

  • Stevie D

    D’oh – what shows up in the preview apparently isn’t what comes up in the published comment – stupid wretched thing.

    What I typed was “replace the @ with ampersand-hash-064-semicolon”

  • heggaton

    @Patrick, my solution *is to use* what you suggested ;) The noscript tag is for users without JavaScript. I’ve never had one of my emails hacked – in the 8 years I’ve been using it.

    @stormrider, I used to suspect the same thing but in the 8 years i’ve been using my JavaScript solution, I’ve never had one of my emails hacked.

    Probably, due to 1) my custom code and 2) it’s possibly too resource intensive to have spam bots interperit (potentially) thousands of lines of code. Dunno, but it works ;)

    Cheers

  • http://www.cemerson.co.uk Stormrider

    It won’t last though. It would be trivial to create, eg, a firefox extension to harvest emails (many already do), thus using the browsers javascript engine to do all the work for you. Either that or the same technology a screen reader uses – use the browser for its rendering, output and javascript interpretation, and simply take the output of that.

    It won’t be long before javascript techniques are useless as well.

  • http://www.patricksamphire.com/ PatrickSamphire

    @heggaton, I was simply suggesting approaching it from the other direction to come up with the same solution. So, rather than having noscript for those without javascript, have the link to the contact form included in the html as default for everyone (but not inside noscript), then use javascript to remove that link and replace it with the obfuscated email.

    That way, you don’t need a noscript tag at all.

    The result will be exactly the same, and there’s no particular reason that anyone should go this more complicated route, but I just like things to look neat, and noscript always seemed a little ugly to me. :)

  • heggaton

    @Patrick, I completely understand now and yeah, you’re way is much better than using the noscript tag :)

    I think I’ll stick to my original way of obfuscating and directing people to the feedback form on JavaScript failover but use your method of replacing HTML rather than the noscript.

    Cheers

  • justseth

    using a contact form is the best of solutions for many reasons:
    1) the developer can control the conversation, ensuring the required information is sent
    2) forms are trackable / measurable
    3) auto-responses can be delivered based upon an array of conditions.
    4) contact forms are routable. meaning, depending upon subject, will determine where the email is sent.

  • roger

    Of course, unfortunately, you have also provided in this article the source code for the spammer to look at currently obfuscated email addresses and harvest them.

    … Hey dude, just look for anything that is class email and then run it through this code …

    Maybe calling the class ‘honeypot’ would be more appropriate.

    great code btw, but those spammers have more time on their hands than we have to defeat them.

    Of course by suggesting it in this comment I am also aiding them … Wheres the delete key ?

  • soundjet

    Nice one! guaranteed? ;)

  • Stomme poes

    [quote]Re-captcha allows you to protect your email with a security image.[/quote]

    More and more sites, I cannot contact the owners because of captchas. I can’t read most of them, and I’m not blind.

    I’ve been using numerical charcter entities for the whole email addresses, the mailto and the text in the anchor, and sometimes also with the word “email”. I also don’t know how well this works, but I’ll keep doing it so long as I know there are still many bots who can’t read them.

  • http://www.calcResult.co.uk omnicity

    There is plenty of evidance to suggest that the spam-bots are already using regExp patterns to look for even partial email addresses.
    Where only a domain name is found, they then launch a dictionary attack with that, though domain names can just as easily be found via DNS.

  • http://www.visual28.com visual28

    I agree with omnicity, the spammers are far to smart and the email is easily hacked using common replacement techniques. I use a modified algorithm of the Project Honeypot system found here:
    http://www.projecthoneypot.org/how_to_avoid_spambots_3.php

    Even it’s not perfect, and will surely be defeated soon enough if not already. The only decent way to protect your email is through the use of a contact form.

  • jj

    I tried it but it’s not working. I put in the email.js file to the server, linked to it in the header, added the ‘window.onload’ line at the bottom, and used “class” in the email link (actually I copied/pasted and just changed the email address), but all I’m seeing is the obfuscated address. And I’m almost 100% sure that I have javascript enabled.

  • http://www.optimalworks.net/ Craig Buckler

    @jj

    …all I’m seeing is the obfuscated address

    The code is either failing or not being started. Run it in Firefox with the error console open (Tools > Error Console) – that should tell you if there are any obvious errors. Otherwise, post a link to your page URL and I’ll take a look.