No cookies or JavaScript? No worries. You can be tracked anyway.

I’m not quite sure how to feel about this one. The web developer in me is saying “Whoa! That’s so cool!” While the web surfer in me is saying “Ew.. I’m leaking data everywhere.. How gross…”

EFF

Digital civil rights group Electronic Frontier Foundation (of recent Net neutrality and Facebook privacy battles) has released an eye-opening online tool called Panopticlick, designed to demonstrate exactly how uniquely identifiable you are — even if you’re diligent enough to take some of the most commonly prescribed privacy measures.

EFF

Conventional wisdom holds that if you disable scripting and refuse to accept cookies, you’ll be denying web sites the tools they need to recognize you when you return, thus maintaining your anonymity.

However, as Panopticlick shows us, there is still a lot of seemingly benign data available to any web server inquisitive enough to ask — items such as user agent, browser plugin details, local timezone, screen size, screen color depth and system fonts.

As is the case with the proteins bases in our DNA, while none of the individual pieces are (likely) unique, when taken as a whole they very likely combine to produce a unique fingerprint.

*Head slap* It was so obvious.

For instance, it told me it had seen my specific Win XP/Chrome setup in 1 in 50 setup. It goes without saying that fonts libraries and plugins lists have much greater scope for individual variation.

So, what does this mean in practical terms?

1). Welcome John Smith. Not.: Obviously when we talk about ‘identifying a user’ we’re not talking about knowing their name, address and phone number. As EFF says ‘All of the data for the project will be collected in an anonymized form which ensures that it is not Personally Identifiable Information, nor otherwise likely to lead to the identification or tracking of any web users..’

It does mean a site can record your behaviour and then use that information the next time you return.

It also means, in theory advertisers might be able to use this data to track you across multiple domains.

2). Performance anxiety: Panopticlick took around 6-8 seconds to run it’s tests on my system, so I’d think most web site owners would think seriously before willingly adding that sort of overhead to a first page load.

Still, it shouldn’t be necessary to query every user.

3). Stealth mode: Clearly the more you customize your browsing environment, the more identifiable you are. As such, a less ‘moddable’ device is by default more anonymous. Does that make Safari for iPhone the new Stealth browser? No flash. No Java, standardized font set. Hmm..

Regardless, it will be interesting to see if we see practical application of this.

Would we know anyway? Probably not.

Win an Annual Membership to Learnable,

SitePoint's Learning Platform

  • http://www.dangrossman.info Dan Grossman

    There are lots of methods for identifying and tracking users without cookies beyond those used by the EFF in this project.

    Browser cache is one vector, involving embedding iframes with unique URLs and checking whether the browser requests these URLs on subsequent visits.

    Browser history is another vector. You can scan browser history through JavaScript by dumping links into the DOM and checking their color. Visited links will read as purple and non-visited links as blue. Using some unique URLs allows you to fingerprint and track repeat visits by the link colors.

    Combine this and even if the user isn’t identifyable by fingerprinting browser info (as the EFF is doing), they can be identified by cookies, by cache and by browser history. To be anonymous you have to not only use a very common system setup, but clear you cookies AND your cache AND your browsing history every time you go online.

  • http://www.optimalworks.net/ Craig Buckler

    This is quite an interesting technique and I recently undertook a similar project where I needed to assess the probability of an individual user accessing a system by user agent and IP address alone. It was fairly easy.

    You can hide some of your identifying attributes by disabling JavaScript. Also, people in organisations with standardised PC configurations will be more difficult.

    But a combination of the user agent and accept headers (plus your IP address) still makes it relatively simple to track activity. In my case, it was 1 in 44,223 users!

  • http://www.starsites.co.za Jacotheron

    On each browser I tested with this, it display “Your browser fingerprint appears to be unique”. In other words, I can be tracked on this information. Another thing I noticed is that the tested browsers is sky-rocketing (with every test it is almost 50 browsers later).

  • http://www.rwtconsultants.com israelisassi

    It’s like an episode of Caprica… well, maybe not quite.

  • Anonymous

    its a cute idea but I would be more worried about flash cookies then this. better privacy and no-script is more then enough for the average user.

  • Dominik

    I suppose you meant the ‘bases’ in our DNA? There are proteins coating the DNA, but using them for identification seems far-fetched compared to the well-known genetic fingerprint.
    But apart from that, interesting article. And quite unsettling.

  • http://www.sitepoint.com AlexW

    @Dominik Sir, I thank you. Correction made. ;)

  • TheWickedFlea

    Alex, your title is misleading.

    I’ve used Panopticlick to learn how to protect myself from this, and the truest answer is that you must restrict cookies and javascript. You can always be tracked, but without javascript and cookies it’s harder to identify you. I went from 1 of 191k to 1 of ~50 just by activating NoScript and disabling cookies (using CookieMonster for site-specific exceptions).

    Your title shows that you can be tracked, but not how hard it is to identify you. Hulu hasn’t a clue who I am because I clear the LSOs it creates, and block its flash scripts that are its trackers. A well informed person has many choices to reduce the ease of tracking them.

  • http://www.clearwind.nl peach

    How do you request what system fonts someone has installed? This is very interesting.
    Do they use the technique where actually render a text and overlay it with an image, to see if its different? I’ve read about that but Im hoping there is a more efficient way…

  • richthegeek

    It’s an interesting practical example of set theory, and as a Firefox/Ubuntu user with custom fonts installed, I was unique on 2 of the tests .. but it still doesn’t cover the issue of why the heck I care?

    What difference does it *actually* make to me if a website knows where I am? More targeted ads (this is good!), content tailored towards me. It’s not big brother, it’s private companies whose *only* aim is to make money from knowing more about me. This is capitalism, not 1984 (Thatcher joke).

  • http://www.brothercake.com/ brothercake

    I’m with richthegeek on this one – does it really matter? Is it really any significant breach of my privacy that my surfing habits can be tracked?

    Even if they know who I am, they still don’t know anything about me, over and above which websites I visit – which is hardly a representative data set, even if it’s complete.

    It’s ironic that many of those (in general, not here) who complain about such privacy breaches are the very same people who will happily put their holiday snaps on Facebook and talk about their personal habits on Twitter. (Or who are worried about the security of e-commerce, yet will happily read out their CC number over the phone!)

    You can do more to protect your privacy just by being careful what you put online, than anything you can do to obfuscate your browsing profile.

  • http://www.sitepoint.com AlexW

    @peach I’m not 100% sure how they’re doing it, but I would guess Flash is the easiest way to query system fonts.

    @richthegeek From a user perspective, I mostly agree. From a developer’s point of view, it has some interesting possibilities. Issues such as users creating multiple user accounts for nefarious purposes (auctions, forums, etc) won’t be permanently solved this way, but it’s a pretty useful extra layer of ID data.

  • davidcroda

    I bet comparing this browser setup information against your GeoLocated IP (http://php.net/manual/en/book.geoip.php) address region would significantly raise the level of “uniqueness” of the tracking as well.

  • http://www.virvo.com Web-Development

    yeah, adding geo data would make it unique in a lot more cases. This is an interesting way to identify a user, but its lack of accuracy is exactly what makes it useless in terms of reliable identification. In other words, it would never stand up in a court.