Source: /cirosantilli/cia-2010-covert-communication-websites/2013-dns-census-ns-records

= 2013 DNS census NS records

ns.csv is 57 GB. This file is too massive, working with it is a pain.

We can also cut down the data a lot with https://stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column/76605540\#76605540[] and <Non .com .net TLDs>[tld filtering]:
``
awk -F, 'BEGIN{OFS=","} { if ($1 != last) { print $1, $3; last = $1; } }' ns.csv | grep -E '\.(com|net|info|org|biz),' > nsu.csv
``
This brings us down to a much more manageable 3.0 GB, 83 M rows.

Let's just scan it once real quick to start with, since likely nothing will come of this venue:
``
grep -f <(awk -F, 'NR>1{print $2}' ../media/cia-2010-covert-communication-websites/hits.csv) nsu.csv | tee nsu-hits.csv
cat nsu-hits.csv | csvcut -c 2 | sort | awk -F. '{OFS="."; print $(NF-1), $(NF)}' | sort | uniq -c | sort -k1 -n
``
As of 267 hits we get:
``
      1 a2hosting.com
      1 amerinoc.com
      1 ayns.net
      1 dailyrazor.com
      1 domainingdepot.com
      1 easydns.com
      1 frienddns.ru
      1 hostgator.com
      1 kolmic.com
      1 name-services.com
      1 namecity.com
      1 netnames.net
      1 tonsmovies.net
      1 webmailer.de
      2 cashparking.com
     55 worldnic.com
     86 domaincontrol.com
``
so yeah, most of those are likely going to be humongous just by looking at the names.

The smallest ones by far from the total are: frienddns.ru with only 487 hits, all others quite large or fake hits due to CSV. Did a quick <Wayback Machine CDX scanning> there but no luck alas.

Let's check the smaller ones:
``
inews-today.com,2013-08-12T03:14:01,ns1.frienddns.ru
source-commodities.net,2012-12-13T20:58:28,ns1.namecity.com -> fake hit due to grep e-commodities.net
dailynewsandsports.com,2013-08-13T08:36:28,ns3.a2hosting.com
just-kidding-news.com,2012-02-04T07:40:50,jns3.dailyrazor.com
fightwithoutrules.com,2012-11-09T01:17:40,sk.s2.ns1.ns92.kolmic.com
fightwithoutrules.com,2013-07-01T22:46:23,ns1625.ztomy.com
half-court.net,2012-09-10T09:49:15,sk.s2.ns1.ns92.kolmic.com
half-court.net,2013-07-07T00:31:12,ns1621.ztomy.com
``
Doubt anything will come out of this.

Let's do a bit of counting out of the total:
``
grep domaincontrol.com ns.csv | awk -F, '{print $1}' | uniq | wc
``
gives ~20M domain using `domaincontrol`. Let's see how many domains are in the first place:
``
awk -F, '{print $1}' ns.csv | uniq | wc
``
so it accounts for 1/4 of the total.