Source: cirosantilli/cia-2010-covert-communication-websites/2013-dns-census-mx-records

= 2013 DNS census MX records

Let' see if there's anything in records/mx.xz.

mx.csv is 21GB.

They do have `"` in the files to escape commas so:

mx.py
``
import csv
import sys
writer = csv.writer(sys.stdout)
with open('mx.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        writer.writerow([row[0], row[3]])
``
Would have been better with csvkit: https://stackoverflow.com/questions/36287982/bash-parse-csv-with-quotes-commas-and-newlines

then:

``
# uniq not amazing as there are often two or three slightly different records repeated on multiple timestamps, but down to 11 GB
python3 mx.py | uniq > mx-uniq.csv
sqlite3 mx.sqlite 'create table t(d text, m text)'
# 13 GB
time sqlite3 mx.sqlite ".import --csv --skip 1 'mx-uniq.csv' t"

# 41 GB
time sqlite3 mx.sqlite 'create index td on t(d)'
time sqlite3 mx.sqlite 'create index tm on t(m)'
time sqlite3 mx.sqlite 'create index tdm on t(d, m)'

# Remove dupes.
# Rows: 150m
time sqlite3 mx.sqlite <<EOF
delete from t
where rowid not in (
  select min(rowid)
  from t
  group by d, m
)
EOF

# 15 GB
time sqlite3 mx.sqlite vacuum
``

Let's see what the hits use:
``
awk -F, 'NR>1{ print $2 }' ../media/cia-2010-covert-communication-websites/hits.csv | xargs -I{} sqlite3 mx.sqlite "select distinct * from t where d = '{}'"
``

At around 267 total hits, only 84 have MX records, and from those that do, almost all of them have exactly:
``
smtp.secureserver.net
mailstore1.secureserver.net
``
with only three exceptions:
``
dailynewsandsports.com|dailynewsandsports.com
inews-today.com|mail.inews-today.com
just-kidding-news.com|just-kidding-news.com
``
We need to count out of the totals!
``
sqlite3 mx.sqlite "select count(*) from t where m = 'mailstore1.secureserver.net'"
``
which gives, ~18M, so nope, it is too much by itself...

Let's try to use that to reduce `av.sqlite` from <2013 DNS Census virtual host cleanup> a bit further:
``
time sqlite3 mx.sqlite '.mode csv' "attach 'aiddcu.sqlite' as 'av'" '.load ./ip' "select ipi2s(av.t.i), av.t.d from av.t inner join t as mx on av.t.d = mx.d and mx.m = 'mailstore1.secureserver.net' order by av.t.i asc" > avm.csv
``
where `avm` stands for `av` with `mx` pruning. This leaves us with only ~500k entries left. With one more figerprint we could do a <Wayback Machine CDX scanning> scan.

Let's check that we still have most our hits in there:
``
grep -f <(awk -F, 'NR>1{print $2}' /home/ciro/bak/git/media/cia-2010-covert-communication-websites/hits.csv) avm.csv
``
At 267 hits we got 81, so all are still present.

secureserver is a hosting provider, we can see their blank page e.g. at: https://web.archive.org/web/20110128152204/http://emmano.com/[]. https://security.stackexchange.com/questions/12610/why-did-secureserver-net-godaddy-access-my-gmail-account/12616#12616 comments:
> secureserver.net is the name GoDaddy use as the reverse DNS for IP addresses used for dedicated/virtual server hosting