I'm currently spending a significant amount of my time on fraud investigations at work. I've written some code which collates logs and transactional data then mashes it up to find patterns. It is accurately predicting the majority of the fraud. (I keep telling my boss I want to work on commission but so far I'm stuck with a salary). Although this is a huge leap forward from the position before I was involved, I'd like to reduce the remaining losses further.
My program relies heavily on IP addresses to identify individual client devices while the groups carrying out the fraud are mostly using mobile dongles or proxying via vulnerable webservers in an effort to obscure their identity. So I've been looking at alternative methods for identifying whom is at the far end of the wire.
(I should point out that in order to carry out a transaction on our system, the users must authenticate themselves therefore anonymity is not an issue for our legitimate users).
While evercookie looks to be ideal for our purposes, it's high profile means that our attackers may be specifically on the lookout for this - as well as the risk that it may be detected as malware by legitimate users with the right software. And despite the fact that our legitimate users have a thouroughly verified identity, I think undermining the security of their computers to be a step too far. The methods described in the Panopticlick project seem to be a more appropriate so I've been looking at these in some detail.
Which User Agent?
The only people publishing stats regarding faked user agents are, not surprisingly, people developing code in this area - and their sites are more likely to be visitied by technically sophisticated users, deliberately trying to test out the detection. I think it's reasonable to surmise that faking of user agents in the wild is relatively rare - so if I can reliably detect a faked user agent, knowing what the real user agent is does not help significantly with generating a unique fingerprint. A further consideration, is that even were this possible, sending the real user-agent back serverside increases the visibility of the fingerprinting process to an attacker.
It's worth noting that the navigator object has other properties / methods indicating the identity of the browser. Notably
With the user-agent switcher on Safari, navigator.userAgent and navigator.appVersion matches the selected user agent in the switcher, and this is what is sent in the request. appName and appCodeName are always Netscape and Mozilla resp. However navigator.platform always reports win32, regardless.
With a Firefox user agent switcher, all the properties were changed.
Robert Accettura has a nice writeup of the detection implemented in jquery, Prototype and
Yui which parse the user agent string while Mootols uses feature detection. The Mootools implementation only differentiates between different suppliers of browsers - not between versions of browser from the same supplier.
Again it differentiates between parsing engines families rather than individual versions. Using a test script I got the following values:
115 - Safari 3
116 - Firefox 10
187 - Chrome 5, MSIE 6
have some more details on feature detection.(3)(4)(5)
Several of the published documents / code make reference to fonts supplied as a good indicator of variability. Most use Flash or Java to get a list of the fonts. Interestingly both seem to provide an unsorted list - so the ordering is determined by where they appear on the disk - adding more unique behaviour. Not having a development toolkit for either, meant I had limited scope for testing this myself - but I did come across some code written by Lalit Patel . Lalit renders a fixed string using a degrading font-selector using different degraded fonts - if the size of the rendered string is the same, then the system must have the prefered font available - neato!
Taking this one step further, if I have a list of fonts to check for, then I can build up a list of what's available. Of course if I look for, say, Arial on a MS Windows platform, Helvetica on MacOS or Vera on Linux it's not going to tell me very much - but on www.codestyle.org I found lists showing the less common fonts. While looking for the most common fonts doesn't add a lot of variability, looking for very rare fonts adds code with little yield - so I created a list of the fonts lying in between these extremes for my code.
On Firefox and webkit based browsers, navigator.plugins provides names and versions of the browser plugins (Adobe Acrobat, Flash, Java etc). Iterating through this is simple. Although MSIE has a plugins property in navigator it is not populated. In order to get information about a plugin you need to create an instance of it. And there is no standard API in ActiveX objects to get version information.
On the builtfromsource blog (author does not seem to provide any identity information) there are examples of how to detect/get version information from some of the more common ActiveX plugins.
Josh Fraser offers a rewritten script for detecting both the timezone offset and whether DST applies for the current TZ .
Does it work?
I've mentioned I did some testing: I wrote a script using feature detection, arguments.callee.toString().length, font detection and plugin detection (i.e. specifically not using the user Agent string) and ran it on some computers at work.
Where I work the computers are all installed from standard builds. So far, I've got 23 fingerprints from 24 (supposedly identical) machines - i.e. I've only got 2 machines returning the same fingerprint.
Where's my code? Sorry, don't want to make it too easy for the bad guys to see what I'm doing. If you follow the links I've provided you'll get the same functionality with just a little cutting and pasting.