Communication mechanism Updated +Created
There are four main types of communication mechanisms found:
  • There is also one known instance where a .zip extension was used! web.archive.org/web/20131101104829*/http://plugged-into-news.net/weatherbug.zip as:
    <applet codebase="/web/20101229222144oe_/http://plugged-into-news.net/" archive="/web/20101229222144oe_/http://plugged-into-news.net/weatherbug.zip"
    JAR is the most common comms, and one of the most distinctive, making it a great fingerprint.
    Several of the JAR files are named something like either:
    • meter.jar
    • bandwidth.jar
    • speed.jar
    as if to pose as Internet speed testing tools? The wonderful subtleties of the late 2000s Internet are a bit over our heads.
    All JARs are directly under root, not in subdirectories, and the basename usually consist of one word, though sometimes two camel cased.
  • JavaScript file. There are two subtypes:
    • JavaScript with SHAs. Rare. Likely older. Way more fingerprintable.
    • JavaScript without SHAs. They have all been obfuscated slightly different and compressed. But the file sizes are all very similar from 8kB to 10kB, and they all look similar, so visually it is very easy to detect a match with good likelyhood.
  • Adobe Flash swf file. In all instances found so far, the name of the SWF matches the name of the second level domain exactly, e.g.:
    http://tee-shot.net/tee-shot.swf
    While this is somewhat of a fingerprint, it is worth noting that is was a relatively commonly used pattern. But it is also the rarest of the mechanisms. This is a at a dissonance with the rest of the web, which circa 2010 already had way more SWF than JAR apparently.
  • CGI comms
These have short single word names with some meaning linked to their website.
Because the communication mechanisms are so crucial, they tend to be less varied, and serve as very good fingerprints. It is not ludicrous, e.g. identical files, but one look at a few and you will know the others.
Reverse engineering Updated +Created
In this section we document the outcomes of more detailed inspection of both the communication mechanisms (JavaScript, JAR, swf) and HTML that might help to better fingerprint the websites.
The Reuters websites Updated +Created
The Reuters article directly reported only two domains in writing:
But by looking at the URLs of the screenshots they provided from other websites we can easily uncover all others that had screenshots, except for the Johnny Carson one, which is just generically named. E.g. the image for the Chinese one is www.reuters.com/investigates/special-report/assets/usa-spies-iran/screencap-activegaminginfo.com.jpg?v=192516290922 which leads us to domain activegaminginfo.com.
Also none of those extra ones have any Google hits except for huge domain dumps such has Expired domain trackers, so maybe this counts as little bit of novel public research.
The full list of domains from screenshots is:
This brings up to 8 known domain names with Wayback Machine archives, plus the yet unidentified Johnny Carlson one, see also: Section "Searching for Carson", which is also almost certainly is on Wayback Machine somewhere given that they have a screenshot of it.