The HTML from the index page of Wayback Machine were:
- dumped at: github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/html
- downloaded with: github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/download-html.sh. Note that there were many supurious errors notably:we just ran it multiple times until all errors were gone.
OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to web.archive.org:443
As per:a few of the HTMLs are interpreted by grep as being binary:
grep . */index.html | grep 'binary file matches'
grep: china-destinations.org/index.html: binary file matches
grep: classicalmusicboxonline.com/index.html: binary file matches
grep: driversinternationalgolf.com/index.html: binary file matches
grep: familyhealthonline.net/index.html: binary file matches
grep: grubbersworldrugbynews.com/index.html: binary file matches
grep: hai-pow.com/index.html: binary file matches
grep: hi-tech-today.com/index.html: binary file matches
grep: networkofnews.com/index.html: binary file matches
grep: nigeriastar.net/index.html: binary file matches
grep: noticias-caracas.com/index.html: binary file matches
grep: theentertainbiz.com/index.html: binary file matches
grep: thefilmcentre.com/index.html: binary file matches
grep: theinternationalgoal.com/index.html: binary file matches
grep: wildbirds-seasia.com/index.html: binary file matches
grep: worldedgenews.com/index.html: binary file matches
The discoverty of a possible HTML information leaks on HTML motivated us to download all HTML and have a grep.
<title>
of webofcheer.com which is cryptically set as:pg1c
We started grepping with:and to just get the titles alone for visual inspection:
grep -ai '<title>' */index.html
grep -ahi '<title>' */index.html | sed -r 's/^\s*<title>//;s/<\/title>.*//'
Some mildly interesting facts include:It is impossible to tell if these were oversights, or intentional to simulate common web development quircks. But they are cute in any case.
- opensourcenewstoday.com is titled just as "Title"
opensourcenewstoday.com/index.html:<title>Title</title>
- a few sites are titled "Untitled Document" e.g.:This may have been the default title in Adobe Dreamweaver.
media-coverage-now.com/index.html:<title>Untitled Document</title> newsandsportscentral.com/index.html: <title>Untitled Document</title> newsincirculation.com/index.html:<title>Untitled Document</title> newsworldsite.com/index.html:<title>Untitled Document</title> primetimemovies.net/index.html:<title>Untitled Document</title> unganadormundial.com/index.html:<title>Untitled Document</title>
- some others have empty title:
aeronet-news.com/index.html:<title></title> al-rashidrealestate.com/index.html: <title></title> arabicnewsunfiltered.com/index.html:<title></title> dailynewsandsports.com/index.html:<title></title> electronictechreviews.com/index.html:<title></title> indirectfreekick.com/index.html:<title></title> iran-newslink-today.com/index.html:<title></title> iraniangoals.com/index.html:<title></title> kickitnews.com/index.html:<title></title> mediocampodefutbol.com/index.html:<title></title> middle-east-newstoday.com/index.html: <title></title> mygadgettech.com/index.html:<title></title> sayaara-auto.com/index.html:<title></title> techwatchtoday.com/index.html:<title></title> the-open-book-online.com/index.html:<title></title> thenewsofpakistan.com/index.html:<title></title> theworld-news.net/index.html:<title></title> todaysengineering.com/index.html:<title></title> todaysnewsreports.net/index.html:<title></title> worldnewsandent.com/index.html:<title></title>
- some others are titled just "index" or a variant of it:
all-sport-headlines.com/index.html:<title>index</title> europeannewsflash.com/index.html:<title>Index</title> fgnl.net/index.html:<title>Index Page</title> iraniangoalkicks.com/index.html:<title>index</title> just-the-news.com/index.html:<title>index</title> mide-news.com/index.html:<title>index</title> mytravelopian.com/index.html:<title>Index</title> noticiasdelmundolatino.com/index.html:<title>index</title> pakcricketgrd.com/index.html: <title>index</title> pangawana.com/index.html:<title>index</title> sportsnewsfinder.com/index.html:<title>index</title> thenewseditor.com/index.html:<title>index</title> turkishnewslinks.com/index.html:<title>index2</title> wahidfutbol.com/index.html:<title>index</title> webscooper.com/index.html:<title>index</title> webworldsports.com/index.html:<title>index</title>
- a few don't have
<title>
at all:b2bworldglobal.com/index.html bailandstump.com/index.html businessexchangetoday.com/index.html commercialspacedesign.com/index.html court-masters.com/index.html flyingtimeline.com/index.html marketflows.net/index.html nouvellesetdesrapports.com/index.html senderosdemontana.com/index.html sixty2media.com/index.htm
Articles by others on the same topic
There are currently no matching articles.