CIA 2010 covert communication websites 2013 DNS census SOA records by
Ciro Santilli 37 Updated 2025-07-16
Same as 2013 DNS census NS records basically, nothing came out.
dnshistory.org contains historical domain -> mappings.
We have not managed to extract much from this source, they don't have as much data on the range of interest.
But they do have some unique data at least, perhaps we should try them a bit more often, e.g. they were the only source we've seen so far that made the association: headlines2day.com -> 212.209.74.126 which places it in the more plausible globalbaseballnews.com IP range.
TODO can it do IP to domain? Or just domain to IP? Asked on their Discord: discord.com/channels/698151879166918727/968586102493552731/1124254204257632377. Their banner suggests that yes:
With our new look website you can now find other domains hosted on the same IP address, your website neighbours and more even quicker than before.
Owner replied, you can't:
At the moment you can only do this for current not historical records
In principle, we could obtain this data from search engines, but Google doesn't track that entire website well, e.g. no hits for
site:dnshistory.org "62.22.60.48" presumably due to heavy IP throttling.Homepage dnshistory.org/ gives date starting in 2009:and it is true that they do have some hits from that useful era.
Here at DNS History we have been crawling DNS records since 2009, our database currently contains over 1 billion domains and over 12 billion DNS records.
Any data that we have the patience of extracting from this we will dump under github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/hits.json.
They appear to piece together data from various sources. This is the most complete historical domain -> IP database we have so far. They don't have hugely more data than viewdns.info, but many times do offer something new. It feels like the key difference is that their data goes further back in the critical time period a bit.
TODO do they have historical reverse IP? The fact that they don't seem to have it suggests that they are just making historical reverse IP requests to a third party via some API?
E.g. searching
thefilmcentre.com under historical data at securitytrails.com/domain/thefilmcentre.com/history/al gives the correct IP 62.22.60.55.Account creation blacklists common email providers such as gmail to force users to use a "corporate" email address. But using random domains like
ciro@cirosantilli.com works fine.Their data seems to date back to 2008 for our searches.
So far, no new domains have been found with Common Crawl, nor have any existing known domains been found to be present in Common Crawl. Our working theory is that Common Crawl never reached the domains How did Alexa find the domains?
Let's try and do something with Common Crawl.
Unfortunately there's no IP data apparently: github.com/commoncrawl/cc-index-table/issues/30, so let's focus on the URLs.
Using their Common Crawl Athena method: commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/
Sample first output line:So
# 2
url_surtkey org,whwheelers)/robots.txt
url https://whwheelers.org/robots.txt
url_host_name whwheelers.org
url_host_tld org
url_host_2nd_last_part whwheelers
url_host_3rd_last_part
url_host_4th_last_part
url_host_5th_last_part
url_host_registry_suffix org
url_host_registered_domain whwheelers.org
url_host_private_suffix org
url_host_private_domain whwheelers.org
url_host_name_reversed
url_protocol https
url_port
url_path /robots.txt
url_query
fetch_time 2021-06-22 16:36:50.000
fetch_status 301
fetch_redirect https://www.whwheelers.org/robots.txt
content_digest 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
content_mime_type text/html
content_mime_detected text/html
content_charset
content_languages
content_truncated
warc_filename crawl-data/CC-MAIN-2021-25/segments/1623488519183.85/robotstxt/CC-MAIN-20210622155328-20210622185328-00312.warc.gz
warc_record_offset 1854030
warc_record_length 639
warc_segment 1623488519183.85
crawl CC-MAIN-2021-25
subset robotstxturl_host_3rd_last_part might be a winner for CGI comms fingerprinting!Naive one for one index:have no results... data scanned: 5.73 GB
select * from "ccindex"."ccindex" where url_host_registered_domain = 'conquermstoday.com' limit 100;Let's see if they have any of the domain hits. Let's also restrict by date to try and reduce the data scanned:Humm, data scanned: 60.59 GB and no hits... weird.
select * from "ccindex"."ccindex" where
fetch_time < TIMESTAMP '2014-01-01 00:00:00' AND
url_host_registered_domain IN (
'activegaminginfo.com',
'altworldnews.com',
...
'topbillingsite.com',
'worldwildlifeadventure.com'
)Sanity check:has a bunch of hits of course. Data scanned: 212.88 MB,
select * from "ccindex"."ccindex" WHERE
crawl = 'CC-MAIN-2013-20' AND
subset = 'warc' AND
url_host_registered_domain IN (
'google.com',
'amazon.com'
)WHERE crawl and subset are a must! Should have read the article first.Let's widen a bit more:Still nothing found... they don't seem to have any of the URLs of interest?
select * from "ccindex"."ccindex" WHERE
crawl IN (
'CC-MAIN-2013-20',
'CC-MAIN-2013-48',
'CC-MAIN-2014-10'
) AND
subset = 'warc' AND
url_host_registered_domain IN (
'activegaminginfo.com',
'altworldnews.com',
...
'worldnewsandent.com',
'worldwildlifeadventure.com'
)Ciro Santilli supports full legalization of all drugs, because he feels that it would be better overall for the world to have cheaper drugs and more drug addicts, but way, way less organized crime.
These should be extremely controlled of course, with extremely high taxes that puts their price just below the current illegal market, and a complete ban on any positive advertising.
Ciro believes that maybe the government could even go as far as giving free drugs to drug addicts so they don't have to rob to get a fix.
This is notably considering that drug-led organized crime completely dominates and corrupts the politics of many production and trafficking zones, which are already generally poor fucked up places to start with:Ciro's experiences in Brazil such as mentioned at São Remo, the favela next to USP, although much less extreme than the above, also come to mind.
Drug traffic corrupts everything. It prevents development of honest people. It is a cancer, which we have failed time and time a gain to cure. The only cure is to accept the other less insidious of addiction.
Does not appear to have any reverse IP hits unfortunately: opendata.stackexchange.com/questions/1951/dataset-of-domain-names/21077#21077. Likely only has domains that were explicitly advertised.
We could not find anything useful in it so far, but there is great potential to use this tool to find new IP ranges based on properties of existing IP ranges. Part of the problem is that the dataset is huge, and is split by top 256 bytes. But it would be reasonable to at least explore ranges with pre-existing known hits...
We have started looking for patterns on
66.* and 208.*, both selected as two relatively far away ranges that have a number of pre-existing hits. 208 should likely have been 212 considering later finds that put several ranges in 212.tcpip_fp:
- 66.104.
- 66.104.175.41: grubbersworldrugbynews.com: 1346397300 SCAN(V=6.01%E=4%D=1/12%OT=22%CT=443%CU=%PV=N%G=N%TM=387CAB9E%P=mipsel-openwrt-linux-gnu),ECN(R=N),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=N),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.48: worlddispatch.net: 1346816700 SCAN(V=6.01%E=4%D=1/2%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=1D5EA%P=mipsel-openwrt-linux-gnu),SEQ(SP=F8%GCD=3%ISR=109%TI=Z%TS=A),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.49: webworldsports.com: 1346692500 SCAN(V=6.01%E=4%D=9/3%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=5044E96E%P=mipsel-openwrt-linux-gnu),SEQ(SP=105%GCD=1%ISR=108%TI=Z%TS=A),OPS(O1=M550ST11NW6%O2=M550ST11NW6%O3=M550NNT11NW6%O4=M550ST11NW6%O5=M550ST11NW6%O6=M550ST11),WIN(W1=1510%W2=1510%W3=1510%W4=1510%W5=1510%W6=1510),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.50: fly-bybirdies.com: 1346822100 SCAN(V=6.01%E=4%D=1/1%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=14655%P=mipsel-openwrt-linux-gnu),SEQ(TI=Z%TS=A),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.53: info-ology.net: 1346712300 SCAN(V=6.01%E=4%D=9/4%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=50453230%P=mipsel-openwrt-linux-gnu),SEQ(SP=FB%GCD=1%ISR=FF%TI=Z%TS=A),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.175.106
- 66.175.106.150: noticiasmusica.net: 1340077500 SCAN(V=5.51%D=1/3%OT=22%CT=443%CU=%PV=N%G=N%TM=38707542%P=mipsel-openwrt-linux-gnu),ECN(R=N),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.175.106.155: atomworldnews.com: 1345562100 SCAN(V=5.51%D=8/21%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=5033A5F2%P=mips-openwrt-linux-gnu),SEQ(SP=FB%GCD=1%ISR=FC%TI=Z%TS=A),ECN(R=Y%DF=Y%TG=40%W=1540%O=M550NNSNW6%CC=N%Q=),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
CIA 2010 covert communication websites 2012 Internet Census hostprobes by
Ciro Santilli 37 Updated 2025-07-16
Hostprobes quick look on two ranges:
208.254.40:
... similar down
208.254.40.95 1334668500 down no-response
208.254.40.95 1338270300 down no-response
208.254.40.95 1338839100 down no-response
208.254.40.95 1339361100 down no-response
208.254.40.95 1346391900 down no-response
208.254.40.96 1335806100 up unknown
208.254.40.96 1336979700 up unknown
208.254.40.96 1338840900 up unknown
208.254.40.96 1339454700 up unknown
208.254.40.96 1346778900 up echo-reply (0.34s latency).
208.254.40.96 1346838300 up echo-reply (0.30s latency).
208.254.40.97 1335840300 up unknown
208.254.40.97 1338446700 up unknown
208.254.40.97 1339334100 up unknown
208.254.40.97 1346658300 up echo-reply (0.26s latency).
... similar up
208.254.40.126 1335708900 up unknown
208.254.40.126 1338446700 up unknown
208.254.40.126 1339330500 up unknown
208.254.40.126 1346494500 up echo-reply (0.24s latency).
208.254.40.127 1335840300 up unknown
208.254.40.127 1337793300 up unknown
208.254.40.127 1338853500 up unknown
208.254.40.127 1346454900 up echo-reply (0.23s latency).
208.254.40.128 1335856500 up unknown
208.254.40.128 1338200100 down no-response
208.254.40.128 1338749100 down no-response
208.254.40.128 1339334100 down no-response
208.254.40.128 1346607900 down net-unreach
208.254.40.129 1335699900 up unknown
... similar downSuggests exactly 127 - 96 + 1 = 31 IPs.
208.254.42:
... similar down
208.254.42.191 1334522700 down no-response
208.254.42.191 1335276900 down no-response
208.254.42.191 1335784500 down no-response
208.254.42.191 1337845500 down no-response
208.254.42.191 1338752700 down no-response
208.254.42.191 1339332300 down no-response
208.254.42.191 1346499900 down net-unreach
208.254.42.192 1334668500 up unknown
208.254.42.192 1336808700 up unknown
208.254.42.192 1339334100 up unknown
208.254.42.192 1346766300 up echo-reply (0.40s latency).
208.254.42.193 1335770100 up unknown
208.254.42.193 1338444900 up unknown
208.254.42.193 1339334100 up unknown
... similar up
208.254.42.221 1346517900 up echo-reply (0.19s latency).
208.254.42.222 1335708900 up unknown
208.254.42.222 1335708900 up unknown
208.254.42.222 1338066900 up unknown
208.254.42.222 1338747300 up unknown
208.254.42.222 1346872500 up echo-reply (0.27s latency).
208.254.42.223 1335773700 up unknown
208.254.42.223 1336949100 up unknown
208.254.42.223 1338750900 up unknown
208.254.42.223 1339334100 up unknown
208.254.42.223 1346854500 up echo-reply (0.13s latency).
208.254.42.224 1335665700 down no-response
208.254.42.224 1336567500 down no-response
208.254.42.224 1338840900 down no-response
208.254.42.224 1339425900 down no-response
208.254.42.224 1346494500 down time-exceeded
... similar downSuggests exactly 223 - 192 + 1 = 31 IPs.
It does appears that long sequences of ranges are a sort of fingerprint. The question is how unique it would be.
First:This reduces us to 2 million IP rows from the total possible 16 million IPs.
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq
t=$n-up-uniq.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq tmp"
time sqlite3 $t 'create table t (i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'OK now just counting hits on fixed windows has way too many results:
sqlite3 208-up-uniq.sqlite "\
SELECT * FROM (
SELECT min(i), COUNT(*) OVER (
ORDER BY i RANGE BETWEEN 15 PRECEDING AND 15 FOLLOWING
) as c FROM t
) WHERE c > 20 and c < 30
"Let's try instead consecutive ranges of length exactly 31 instead then:271. Hmm. A bit more than we'd like...
sqlite3 208-up-uniq.sqlite <<EOF
SELECT f, t - f as c FROM (
SELECT min(i) as f, max(i) as t
FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t)
GROUP BY grp
ORDER BY i
) where c = 31
EOFAnother route is to also count the ups:
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-cnt
t=$n-up-uniq-cnt.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq-cnt tmp"
time sqlite3 $t 'create table t (cnt integer, i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select cnt as integer, str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'Let's see how many consecutives with counts:
sqlite3 208-up-uniq-cnt.sqlite <<EOF
SELECT f, t - f as c FROM (
SELECT min(i) as f, max(i) as t
FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t WHERE cnt >= 3)
GROUP BY grp
ORDER BY i
) where c > 28 and c < 32
EOFLet's check on 66:not representative at all... e.g. several convfirmed hits are down:
grep -e '66.45.179' -e '66.45.179' 6666.45.179.215 1335305700 down no-response
66.45.179.215 1337579100 down no-response
66.45.179.215 1338765300 down no-response
66.45.179.215 1340271900 down no-response
66.45.179.215 1346813100 down no-responseWhatever it is that biology studies.
CIA 2010 covert communication websites 2012 Internet Census icmp_ping by
Ciro Santilli 37 Updated 2025-07-16
Let's check relevancy of known hits:Output:
grep -e '208.254.40' -e '208.254.42' 208 | tee 208hits208.254.40.95 1355564700 unreachable
208.254.40.95 1355622300 unreachable
208.254.40.96 1334537100 alive, 36342
208.254.40.96 1335269700 alive, 17586
..
208.254.40.127 1355562900 alive, 35023
208.254.40.127 1355593500 alive, 59866
208.254.40.128 1334609100 unreachable
208.254.40.128 1334708100 alive from 208.254.32.214, 43358
208.254.40.128 1336596300 unreachableThe rest of 208 is mostly unreachable.
208.254.42.191 1335294900 unreachable
...
208.254.42.191 1344737700 unreachable
208.254.42.191 1345574700 Icmp Error: 0,ICMP Network Unreachable, from 63.111.123.26
208.254.42.191 1346166900 unreachable
...
208.254.42.191 1355665500 unreachable
208.254.42.192 1334625300 alive, 6672
...
208.254.42.192 1355658300 alive, 57412
208.254.42.193 1334677500 alive, 28985
208.254.42.193 1336524300 unreachable
208.254.42.193 1344447900 alive, 8934
208.254.42.193 1344613500 alive, 24037
208.254.42.193 1344806100 alive, 20410
208.254.42.193 1345162500 alive, 10177
...
208.254.42.223 1336590900 alive, 23284
...
208.254.42.223 1355555700 alive, 58841
208.254.42.224 1334607300 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224 1334681100 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224 1336563900 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224 1344451500 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.138
208.254.42.224 1344566700 unreachable
208.254.42.224 1344762900 unreachablen=66
time awk '$3~/^alive,/ { print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-cOK down to 45 MB, now we can work.
grep -e '66.45.179' -e '66.104.169' -e '66.104.173' -e '66.104.175' -e '66.175.106' '66-alive-uniq-c' | tee 66hitsDomain list only, no IPs and no dates. We haven't been able to extract anything of interest from this source so far.
Domain hit count when we were at 69 hits: only 9, some of which had been since reused. Likely their data collection did not cover the dates of interest.
CIA 2010 covert communication websites Expired domain trackers by
Ciro Santilli 37 Updated 2025-07-16
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
- hupo.com: e.g. static.hupo.com/expdomain_myadmin/2012-03-06(国际域名).txt. Heavily IP throttled. Tor hindered more than helped.Scraping script: ../cia-2010-covert-communication-websites/hupo.sh. Scraping does about 1 day every 5 minutes relatively reliably, so about 36 hours / year. Not bad.Results are stored under
tmp/humo/<day>.Check for hit overlap:The hits are very well distributed amongst days and months, at least they did a good job hiding these potential timing fingerprints. This feels very deliberately designed.grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/hupo/*There are lots of hits. The data set is very inclusive. Also we understand that it must have been obtains through means other than Web crawling, since it contains so many of the hits.Some of their files are simply missing however unfortunately, e.g. neither of the following exist:webmasterhome.cn did contain that one however: domain.webmasterhome.cn/com/2012-07-01.asp. Hmm. we might have better luck over there then?2018-11-19 is corrupt in a new and wonderful way, with a bunch of trailing zeros:ends in:wget -O hupo-2018-11-19 'http://static.hupo.com/expdomain_myadmin/2018-11-19%EF%BC%88%E5%9B%BD%E9%99%85%E5%9F%9F%E5%90%8D%EF%BC%89.txt hd hupo-2018-11-19000ffff0 74 75 64 69 65 73 2e 63 6f 6d 0d 0a 70 31 63 6f |tudies.com..p1co| 00100000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0018a5e0 00 00 00 00 00 00 00 00 00 |.........|More generally, several files contain invalid domain names with non-ASCII characters, e.g. 2013-01-02 contains365<D3>л<FA><C2><CC>.com. Domain names can only contain ASCII charters: stackoverflow.com/questions/1133424/what-are-the-valid-characters-that-can-show-up-in-a-url-host Maybe we should get rid of any such lines as noise.Some files around 2011-09-06 start with an empty line. 2014-01-15 starts with about twenty empty lines. Oh and that last one also has some trash bytes the end<B7><B5><BB><D8>. Beauty. - webmasterhome.cn: e.g. domain.webmasterhome.cn/com/2012-03-06.asp. Appears to contain the exact same data as "static.hupo.com"Also has some randomly missing dates like hupo.com, though different missing ones from hupo, so they complement each other nicely.Some of the URLs are broken and don't inform that with HTTP status code, they just replace the results with some Chinese text 无法找到该页 (The requested page could not be found):Several URLs just return length 0 content, e.g.:It is not fully clear if this is a throttling mechanism, or if the data is just missing entirely.
curl -vvv http://domain.webmasterhome.cn/com/2015-10-31.asp * Trying 125.90.93.11:80... * Connected to domain.webmasterhome.cn (125.90.93.11) port 80 (#0) > GET /com/2015-10-31.asp HTTP/1.1 > Host: domain.webmasterhome.cn > User-Agent: curl/7.88.1 > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 21 Oct 2023 15:12:23 GMT < Server: Microsoft-IIS/6.0 < X-Powered-By: ASP.NET < Content-Length: 0 < Content-Type: text/html < Set-Cookie: ASPSESSIONIDCSTTTBAD=BGGPAONBOFKMMFIPMOGGHLMJ; path=/ < Cache-control: private < * Connection #0 to host domain.webmasterhome.cn left intactStarting around 2018, the IP limiting became very intense, 30 mins / 1 hour per URL, so we just gave up. Therefore, data from 2018 onwards does not contain webmasterhome.cn data.Starting from2013-05-10the format changes randomly. This also shows us that they just have all the HTML pages as static files on their server. E.g. with:we see:grep -a '<pre' * | s2013-05-09:<pre style='font-family:Verdana, Arial, Helvetica, sans-serif; '><strong>2013<C4><EA>05<D4><C2>09<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong><br>0-3y.com 2013-05-10:<pre><strong>2013<C4><EA>05<D4><C2>10<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong> - justdropped.com: e.g. www.justdropped.com/drops/010112com.html. First known working day:
2006-01-01. Unthrottled. - yoid.com: e.g.: yoid.com/bydate.php?d=2016-06-03&a=a. First known workding day:
2016-06-01.
Data comparison:
- 2012-01-01Looking only at the
.com:The lists are quite similar however.- webmastercn has just about ten extra ones than justdropped, the rest is exactly the same
- justdropped has some extra and some missing from hupo
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
./justdropped.sh &
wait
./justdropped-post.sh
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.shCount unique domains in the repos:
( echo */*/*/* | xargs cat ) | sort -u | wcThe extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
- archive.org/details/expired-domain-names-by-day
- github.com/cirosantilli/expired-domain-names-by-day-* repos:
- github.com/cirosantilli/expired-domain-names-by-day-2006
- github.com/cirosantilli/expired-domain-names-by-day-2007
- github.com/cirosantilli/expired-domain-names-by-day-2008
- github.com/cirosantilli/expired-domain-names-by-day-2009
- github.com/cirosantilli/expired-domain-names-by-day-2010
- github.com/cirosantilli/expired-domain-names-by-day-2011 (~11M)
- github.com/cirosantilli/expired-domain-names-by-day-2012 (~18M)
- github.com/cirosantilli/expired-domain-names-by-day-2013 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2014 (~29M)
- github.com/cirosantilli/expired-domain-names-by-day-2015 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2016
- github.com/cirosantilli/expired-domain-names-by-day-2017
- github.com/cirosantilli/expired-domain-names-by-day-2018
- github.com/cirosantilli/expired-domain-names-by-day-2019
- github.com/cirosantilli/expired-domain-names-by-day-2020
- github.com/cirosantilli/expired-domain-names-by-day-2021
- github.com/cirosantilli/expired-domain-names-by-day-2022
- github.com/cirosantilli/expired-domain-names-by-day-2023
- github.com/cirosantilli/expired-domain-names-by-day-2024
- GitHub trackers:
- admin-monitor.shiyue.com
- anquan.didichuxing.com
- app.cloudsek.com
- app.flare.io
- app.rainforest.tech
- app.shadowmap.com
- bo.serenety.xmco.fr 8 1
- bts.linecorp.com
- burn2give.vercel.app
- cbs.ctm360.com 17 2
- code6.d1m.cn
- code6-ops.juzifenqi.com
- codefend.devops.cndatacom.com
- dlp-code.airudder.com
- easm.atrust.sangfor.com
- ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
- ecall.beygoo.me 2 1
- eos.vip.vip.com 1 1
- foradar.baimaohui.net 2 1
- fty.beygoo.me
- hive.telefonica.com.br 2 1
- hulrud.tistory.com
- kartos.enthec.com
- soc.futuoa.com
- lullar-com-3.appspot.com
- penetration.houtai.io 2 1
- platform.sec.corp.qihoo.net
- plus.k8s.onemt.co 4 1
- pmp.beygoo.me 2 1
- portal.protectorg.com
- qa-boss.amh-group.com
- saicmotor.saas.cubesec.cn
- scan.huoban.com
- sec.welab-inc.com
- security.ctrip.com 10 3
- siem-gs.int.black-unique.com 2 1
- soc-github.daojia-inc.com
- spigotmc.org 2 1
- tcallzgroup.blueliv.com
- tcthreatcompass05.blueliv.com 4 1
- tix.testsite.woa.com 2 1
- toucan.belcy.com 1 1
- turbo.gwmdevops.com 18 2
- urlscan.watcherlab.com
- zelenka.guru. Looks like a Russian hacker forum.
- LinkedIn profile views:
- "Information Security Specialist at Forcepoint"
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper ../cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":produces per-year results for the regex term OK lets:
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019news|global between the years under:tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'Other searches that are not dense enough for our patience:
world|global|[^.]infoOMG and a few more. It's amazing.
news search might be producing some golden, golden new hits!!! Going full into this. Hits:- thepyramidnews.com
- echessnews.com
- tickettonews.com
- airuafricanews.com
- vuvuzelanews.com
- dayenews.com
- newsupdatesite.com
- arabicnewsonline.com
- arabicnewsunfiltered.com
- newsandsportscentral.com
- networkofnews.com
- trekkingtoday.com
- financial-crisis-news.com
From Surely You're Joking, Mr. Feynman chapter O Americano, Outra Vez!:
The people from the airlines were somewhat bored with their lives, strangely enough, and at night they would often go to bars to drink. I liked them all, and in order to be sociable, I would go with them to the bar to have a few drinks, several nights a week.One day, about 3:30 in the afternoon, I was walking along the sidewalk opposite the beach at Copacabana past a bar. I suddenly got this treMENdous, strong feeling: "That's just what I want; that'll fit just right. I'd just love to have a drink right now!"I started to walk into the bar, and I suddenly thought to myself, "Wait a minute! It's the middle of the afternoon. There's nobody here, There's no social reason to drink. Why do you have such a terribly strong feeling that you have to have a drink?" - and I got scared.I never drank ever again, since then. I suppose I really wasn't in any danger, because I found it very easy to stop. But that strong feeling that I didn't understand frightened me. You see, I get such fun out of thinking that I don't want to destroy this most pleasant machine that makes life such a big kick. It's the same reason that, later on, I was reluctant to try experiments with LSD in spite of my curiosity about hallucinations.
One notable drug early teens Ciro consumed was Magic: The Gathering, see also: Section "Magic: The Gathering is addictive".
TODO what does this Chinese forum track? New registrations? Their focus seems to be domain name speculation
Some of the threads contain domain dumps. We haven't yet seen a scrapable URL pattern, but their data goes way back and did have various hits. The forum seems to have started in 2006: club.domain.cn/forum.php?mod=forumdisplay&fid=41&page=10127
club.domain.cn/forum.php?mod=viewthread&tid=241704 "【国际域名拟删除列表】2007年06月16日" is the earliest list we could find. It is an expired domain list.
Some hits:
- club.domain.cn/forum.php?mod=viewthread&tid=709388 contains
alljohnny.comThe thread title is "2009.5.04". The post date 2009-04-30Breadcrumb nav: 域名论坛 > 域名增值交易区 > 国际域名专栏 (domain name forum > area for domain names increasing in value > international domais)
CIA 2010 covert communication websites "Mass Deface III" pastebin by
Ciro Santilli 37 Updated 2025-07-16
pastebin.com/CTXnhjeS dated mega early on Sep 30th, 2012 by CYBERTAZIEX.
This source was found by Oleg Shakirov.
This pastebin contained a few new hits, in addition to some pre-existing ones. Most of the hits them seem to be linked to the IP 72.34.53.174, which presumably is a major part of the fingerprint found by CYBERTAZIEX, though unsurprisingly methodology is unclear. As documented, the domains appear to be linked to a "Condor hosting" provider, but it is hard to find any information about it online.
From the title, it would seem that someone hacked into Condor and defaced all of its sites, including unknowingly some CIA ones which is LOL.
Ciro Santilli checked every single non-subdomain domain in the list.
Other files under the same account: pastebin.com/u/cybertaziex did not seem of interest.
The author's real name appears to be Deni Suwandi: twitter.com/denz_999 from Indonesia, but all accounts appear to be inactive, otherwise we'd ping him to ask for more info about the list.
www.zone-h.com lists some of the domains. They also seem to have intended to have snapshots of the defaces but we can't see them which is sad:
- www.zone-h.com/mirror/id/18994983 Inspecting the source we see an image zonehmirrors.org/defaced/2013/01/14/vypconsulting.com//tmp/sejeal.jpg "Sejeal" "Memorial of Gaza Martyrs". Sejeal defacements are mentioned e.g. at:
- www.zone-h.com/mirror/id/18410811 inspecting source we find: zonehmirrors.org/defaced/2012/09/30/ambrisbooks.com/ which lists the team:
alljohnny.com had a hit: ipinf.ru/domains/alljohnny.com/, and so Ciro started looking around... and a good number of other things have hits.Not all of them, definitely less data than viewdns.info.
But they do reverse IP, and they show which nearby reverse IPs have hits on the same page, for free, which is great!
Shame their ordering is purely alphabetical, doesn't properly order the IPs so it is a bit of a pain, but we can handle it.
OMG, Russians!!!
In this section we document the outcomes of more detailed inspection of both the communication mechanisms (JavaScript, JAR, swf) and HTML that might help to better fingerprint the websites.
CIA 2010 covert communication websites Communication mechanism by
Ciro Santilli 37 Updated 2025-07-16
There are four main types of communication mechanisms found:These have short single word names with some meaning linked to their website.
- There is also one known instance where a .zip extension was used! web.archive.org/web/20131101104829*/http://plugged-into-news.net/weatherbug.zip as:
<applet codebase="/web/20101229222144oe_/http://plugged-into-news.net/" archive="/web/20101229222144oe_/http://plugged-into-news.net/weatherbug.zip"JAR is the most common comms, and one of the most distinctive, making it a great fingerprint. - JavaScript file. There are two subtypes:
- JavaScript with SHAs. Rare. Likely older. Way more fingerprintable.
- JavaScript without SHAs. They have all been obfuscated slightly different and compressed. But the file sizes are all very similar from 8kB to 10kB, and they all look similar, so visually it is very easy to detect a match with good likelyhood.
- Adobe Flash swf file. In all instances found so far, the name of the SWF matches the name of the second level domain exactly, e.g.:While this is somewhat of a fingerprint, it is worth noting that is was a relatively commonly used pattern. But it is also the rarest of the mechanisms. This is a at a dissonance with the rest of the web, which circa 2010 already had way more SWF than JAR apparently.
http://tee-shot.net/tee-shot.swfSome of the SWF websites have archives for empty/servletpages:which makes us think that it is a part of the SWF system../bailsnboots.com/20110201234509/servlet/teammate/index.html ./currentcommunique.com/20110130162713/servlet/summer/index.html ./mynepalnews.com/20110204095758/servlet/SnoopServlet/index.html ./mynepalnews.com/20110204095403/servlet/release/index.html ./www.hassannews.net/20101230175421/servlet/jordan/index.html ./zerosandonesnews.com/20110209084339/servlet/technews/index.html - CGI comms
Because the communication mechanisms are so crucial, they tend to be less varied, and serve as very good fingerprints. It is not ludicrous, e.g. identical files, but one look at a few and you will know the others.
Pinned article: Introduction to the OurBigBook Project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 3. Visual Studio Code extension installation.Figure 4. Visual Studio Code extension tree navigation.Figure 5. Web editor. You can also edit articles on the Web editor without installing anything locally.Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.Video 4. OurBigBook Visual Studio Code extension editing and navigation demo. Source. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact





