We've noticed that often when there is a hit range:
  • there is only one IP for each domain
  • there is a range of about 20-30 of those
and that this does not seem to be that common. Let's see if that is a reasonable fingerprint or not.
Note that although this is the most common case, we have found multiple hits that viewdns.info maps to the same IP.
First we create a table u (unique) that only have domains which are the only domain for an IP, let's see by how much that lowers the 191 M total unique domains:
time sqlite3 u.sqlite 'create table t (d text, i text)'
time sqlite3 av.sqlite -cmd "attach 'u.sqlite' as u" "insert into u.t select min(d) as d, min(i) as i from t where d not like '%.%.%' group by i having count(distinct d) = 1"
The not like '%.%.%' removes subdomains from the counts so that CGI comms are still included, and distinct in count(distinct is because we have multiple entries at different timestamps for some of the hits.
Let's start with the 208 subset to see how it goes:
time sqlite3 av.sqlite -cmd "attach 'u.sqlite' as u" "insert into u.t select min(d) as d, min(i) as i from t where i glob '208.*' and d not like '%.%.%' and (d like '%.com' or d like '%.net') group by i having count(distinct d) = 1"
OK, after we fixed bugs with the above we are down to 4 million lines with unique domain/IP pairs and which contains all of the original hits! Almost certainly more are to be found!
This data is so valuable that we've decided to upload it to: archive.org/details/2013-dns-census-a-novirt.csv Format:
8,chrisjmcgregor.com
11,80end.com
28,fine5.net
38,bestarabictv.com
49,xy005.com
50,cmsasoccer.com
80,museemontpellier.net
100,newtiger.com
108,lps-promptservice.com
111,bridesmaiddressesshow.com
The numbers of the first column are the IPs as a 32-bit integer representation, which is more useful to search for ranges in.
To make a histogram with the distribution of the single hostname IPs:
#!/usr/bin/env bash
bin=$((2**24))
sqlite3 2013-dns-census-a-novirt.sqlite -cmd '.mode csv' >2013-dns-census-a-novirt-hist.csv <<EOF
select i, sum(cnt) from (
  select floor(i/${bin}) as i,
         count(*) as cnt
    from t
    group by 1
  union
  select *, 0 as cnt from generate_series(0, 255)
)
group by i
EOF
gnuplot \
  -e 'set terminal svg size 1200, 800' \
  -e 'set output "2013-dns-census-a-novirt-hist.svg"' \
  -e 'set datafile separator ","' \
  -e 'set tics scale 0' \
  -e 'unset key' \
  -e 'set xrange[0:255]' \
  -e 'set title "Counts of IPs with a single hostname"' \
  -e 'set xlabel "IPv4 first byte"' \
  -e 'set ylabel "count"' \
  -e 'plot "2013-dns-census-a-novirt-hist.csv" using 1:2:1 with labels' \
;
Which gives the following useless noise, there is basically no pattern:
https://raw.githubusercontent.com/cirosantilli/media/master/cia-2010-covert-communication-websites/2013-dns-census-a-novirt-hist.svg
There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:
grep -e news -e noticias -e nouvelles -e world -e global
iran + football:
  • iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.
3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:
  • 66.45.179.205 noticiasporjanua.com
  • 66.237.236.247 comunidaddenoticias.com
  • 204.176.38.143 noticiassofisticadas.com
Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:
  • 216.97.231.56 nouvelles-d-aujourdhuis.com
news + world:
  • 210.80.75.55 philippinenewsonline.net
news + global:
  • 204.176.39.115 globalprovincesnews.com
  • 212.209.74.105 globalbaseballnews.com
  • 212.209.79.40: hydradraco.com
OK, I've decided to do a complete Wayback Machine CDX scanning of news... Searching for .JAR or https.*cgi-bin.*\.cgi are killers, particularly the .jar hits, here's what came out:
  • 62.22.60.49 telecom-headlines.com
  • 62.22.61.206 worldnewsnetworking.com
  • 64.16.204.55 holein1news.com
  • 66.104.169.184 bcenews.com
  • 69.84.156.90 stickshiftnews.com
  • 74.116.72.236 techtopnews.com
  • 74.254.12.168 non-stop-news.net
  • 193.203.49.212 inews-today.com
  • 199.85.212.118 just-kidding-news.com
  • 207.210.250.132 aeronet-news.com
  • 212.4.18.129 sightseeingnews.com
  • 212.209.90.84 thenewseditor.com
  • 216.105.98.152 modernarabicnews.com
Wayback Machine CDX scanning of "world":
  • 66.104.173.186 myworldlymusic.com
"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.
"today": only 3.5k matches in 2013-dns-census-a-novirt.csv and 12 hits out of 269 hits, TODO how many on those on 2013-dns-census-a-novirt? No new hits.
"world", "global", "international", and spanish/portuguese/French versions like "mondo", "mundo", "mondi": 15k matches in 2013-dns-census-a-novirt.csv. No new hits.