From The Reuters websites and others we've found, we can establish see some clear stylistic trends across the websites which would allow us to find other likely candidates upon inspection:The most notable dissonance from the rest of the web is that there are no commercial looking website of companies, presumably because it was felt that it would be possible to verify the existence of such companies.
- natural sounding, sometimes long-ish, domain names generally with 2 or 3 full words. Most in English language, but a few in Spanish, and very few in other languages like French.
- shallow websites with a few tabs, many external links, sometimes many images, and few internal pages
- common themes include:
- .com and .net top-level domains, plus a few other very rare non .com .net TLDs, notably .info and .org
- each one has one "communication mechanism file": communication mechanisms
- narrow page width like in the days of old, lots of images
- split header images
- some common pattern they follow in their news lists:
ul.rss-items > li.rss-item
, e.g.: web.archive.org/web/20110202092126/http://beamingnews.com/- links with class
a.newslink
anda.newslinkalt
e.g. web.archive.org/web/20110128181622/http://profile-news.com/
Most domains are the only domain for its IP, i.e. the websites are mostly private hosted. However we have later found many exceptions to this general indicator, so it should not be used as a strong exclusion rule.
It would be fun to actually reverse search into one of their stock image provider's original images. Ones we've found:
Maybe it is some kind of outdated web design thing, which they took much further in time than the average website, like the JAR.
Their websites do appear to follow common style guidelines form earlier eras, around the early 2000s notably, some legit sites that look a lot like hits:
An example:
Looking at the source code of: web.archive.org/web/20130828122833/http://euronewsonline.net/euro_bus.php we noticed an interesting comment:which presumably refers to Adobe ImageReady:A sample tutorial: people.goshen.edu/~paulmr/physix/326/imageready/slicendice.php
<!-- ImageReady Slices (enewsweather.psd) -->
Adobe ImageReady was a bitmap graphics editor that was shipped with Adobe Photoshop for six years. It was available for Windows, Classic Mac OS and Mac OS X from 1998 to 2007. ImageReady was designed for web development and closely interacted with Photoshop
Some of the websites use CSS background images to populate the images, e.g. ingenuitytrendz.com has HTML:and then the CSS engineering.css does:
ingenuitytrendz.com/20110201170354/index.html: <li><a id="banner1"> </a></li>
ingenuitytrendz.com/20110201170354/index.html: <li><a id="banner2"> </a></li>
ingenuitytrendz.com/20110201170354/index.html: <li><a id="banner3"> </a></li>
#banner1 { background: url(/web/20110201170405im_/http://ingenuitytrendz.com/images/banner_01.jpg) no-repeat center; }
#banner2 { background: url(/web/20110201170405im_/http://ingenuitytrendz.com/images/banner_02.jpg) no-repeat center; }
#banner3 { background: url(/web/20110201170405im_/http://ingenuitytrendz.com/images/banner_03.jpg) no-repeat center; }
The HTML from the index page of Wayback Machine were:
- dumped at: github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/html
- downloaded with: github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/download-html.sh. Note that there were many supurious errors notably:we just ran it multiple times until all errors were gone.
OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to web.archive.org:443
The best way to analyse the HTML is to grap our dumps from: github.com/cirosantilli/cia-2010-websites-dump.
Some possibly interesting searches include:
Some of the HTML files contain conditional comments e.g. web.archive.org/web/20091023041107/http://aquaswimming.com/ contains:
<!--[if IE 6]> <link href="swimstyleie6.css" rel="stylesheet" type="text/css"> <![endif]-->
Varios of the non-English websites seem to have comments translating the content e.g.:This feels like it could be the translation helping the technical webdev team know what is what.
./noticiasmusica.net/20101230165001/index.html:<h2>Alguns dos Melhores Sites Nacionais</h2><!--some of the best national sites (in music)-->
Many of the RSS frame pages use:which is a weird HTML tag that would lead all links to open on new tabs, e.g. web.archive.org/web/20110202124411/http://thecricketfan.com/home.html.
<base target="_blank" />
Various websites have pages with .php extension. It feels likely that all websites were written in PHP.
Some sites use a
feeds.php
for the feeds, e.g. http://www.absolutebearing.net//absolutebearing_feeds/feeds.php?src=http%3A%2F%2Ffeeds2.feedburner.com%2FOceanyachtsinfo&desc=1Some URLs existed both in HTML and .php extension, or were converted at some point:
allworldstatistics.com/20110207151941/comprehensivesources.html
allworldstatistics.com/20130818155225/comprehensivesources.php
A few of the PHP urls have weird IDs in them like we wonder what they mean.
omktf
, juqwt
and qlaqft
:./middle-east-newstoday.com/20100829004127/omktf/uirl.php?ok=461128
./newsandsportscentral.com/20100327130237/juqwt/eubcek.php?pe=747155
./pondernews.net/20100826031745/lldwg/qlaqft.php?fc=281298
As per:a few of the HTMLs are interpreted by grep as being binary:
grep . */index.html | grep 'binary file matches'
grep: china-destinations.org/index.html: binary file matches
grep: classicalmusicboxonline.com/index.html: binary file matches
grep: driversinternationalgolf.com/index.html: binary file matches
grep: familyhealthonline.net/index.html: binary file matches
grep: grubbersworldrugbynews.com/index.html: binary file matches
grep: hai-pow.com/index.html: binary file matches
grep: hi-tech-today.com/index.html: binary file matches
grep: networkofnews.com/index.html: binary file matches
grep: nigeriastar.net/index.html: binary file matches
grep: noticias-caracas.com/index.html: binary file matches
grep: theentertainbiz.com/index.html: binary file matches
grep: thefilmcentre.com/index.html: binary file matches
grep: theinternationalgoal.com/index.html: binary file matches
grep: wildbirds-seasia.com/index.html: binary file matches
grep: worldedgenews.com/index.html: binary file matches
The discoverty of a possible HTML information leaks on HTML motivated us to download all HTML and have a grep.
<title>
of webofcheer.com which is cryptically set as:pg1c
We started grepping with:and to just get the titles alone for visual inspection:
grep -ai '<title>' */index.html
grep -ahi '<title>' */index.html | sed -r 's/^\s*<title>//;s/<\/title>.*//'
Some mildly interesting facts include:It is impossible to tell if these were oversights, or intentional to simulate common web development quircks. But they are cute in any case.
- opensourcenewstoday.com is titled just as "Title"
opensourcenewstoday.com/index.html:<title>Title</title>
- a few sites are titled "Untitled Document" e.g.:This may have been the default title in Adobe Dreamweaver.
media-coverage-now.com/index.html:<title>Untitled Document</title> newsandsportscentral.com/index.html: <title>Untitled Document</title> newsincirculation.com/index.html:<title>Untitled Document</title> newsworldsite.com/index.html:<title>Untitled Document</title> primetimemovies.net/index.html:<title>Untitled Document</title> unganadormundial.com/index.html:<title>Untitled Document</title>
- some others have empty title:
aeronet-news.com/index.html:<title></title> al-rashidrealestate.com/index.html: <title></title> arabicnewsunfiltered.com/index.html:<title></title> dailynewsandsports.com/index.html:<title></title> electronictechreviews.com/index.html:<title></title> indirectfreekick.com/index.html:<title></title> iran-newslink-today.com/index.html:<title></title> iraniangoals.com/index.html:<title></title> kickitnews.com/index.html:<title></title> mediocampodefutbol.com/index.html:<title></title> middle-east-newstoday.com/index.html: <title></title> mygadgettech.com/index.html:<title></title> sayaara-auto.com/index.html:<title></title> techwatchtoday.com/index.html:<title></title> the-open-book-online.com/index.html:<title></title> thenewsofpakistan.com/index.html:<title></title> theworld-news.net/index.html:<title></title> todaysengineering.com/index.html:<title></title> todaysnewsreports.net/index.html:<title></title> worldnewsandent.com/index.html:<title></title>
- some others are titled just "index" or a variant of it:
all-sport-headlines.com/index.html:<title>index</title> europeannewsflash.com/index.html:<title>Index</title> fgnl.net/index.html:<title>Index Page</title> iraniangoalkicks.com/index.html:<title>index</title> just-the-news.com/index.html:<title>index</title> mide-news.com/index.html:<title>index</title> mytravelopian.com/index.html:<title>Index</title> noticiasdelmundolatino.com/index.html:<title>index</title> pakcricketgrd.com/index.html: <title>index</title> pangawana.com/index.html:<title>index</title> sportsnewsfinder.com/index.html:<title>index</title> thenewseditor.com/index.html:<title>index</title> turkishnewslinks.com/index.html:<title>index2</title> wahidfutbol.com/index.html:<title>index</title> webscooper.com/index.html:<title>index</title> webworldsports.com/index.html:<title>index</title>
- a few don't have
<title>
at all:b2bworldglobal.com/index.html bailandstump.com/index.html businessexchangetoday.com/index.html commercialspacedesign.com/index.html court-masters.com/index.html flyingtimeline.com/index.html marketflows.net/index.html nouvellesetdesrapports.com/index.html senderosdemontana.com/index.html sixty2media.com/index.htm
Articles by others on the same topic
There are currently no matching articles.