CIA 2010 covert communication websites How did Alexa find the domains? by
Ciro Santilli 37 Updated 2025-07-16
It can't be HTML crawl because presumably there wouldn't have been links to those websites? Presumably this is why Common Crawl doesn't seem to have any hits.
The same question also applies to the 2013 DNS Census. It has less hits, but still has many.
Whatever they did, we are so so glad that they did!
.com and .net are very dominant. Here we list other choices made:
.info: has a few hits:Did a full Wayback Machine CDX scanning on .info after:That makes about 10k domains, so it's about the right size.grep -e news -e noticias -e nouvelles -e world -e global.org: has a least one hit, see: Are there .org hits?.biz:- unarchived comms:
- atthemovies.biz
- unarchived comms:
Previously it was unclear if there were any .org hits, until we found the first one with clear comms: web.archive.org/web/20110624203548/http://awfaoi.org/hand.jar
Later on, two more clear ones were found with expired domain trackers:further settling their existence. Later on newimages.org also came to light.
Others that had been previously found in IP ranges but without clear comms:
.org is very rare, and has been excluded from some of our search heuristics. That was a shame, but likely not much was missed.
This is a dark art, and many of the sources are shady as fuck! We often have no idea of their methodology. Also no source is fully complete. We just piece up as best we can.
- www.zone-h.org/archive/ip=208.76.80.93/page=11?hz=1 mentions
newsupdatesite.comand mentions "defacement", the "Mass Deface III" pastebin comes to mind. No other nearby hits on quick inspection.
D'oh.
But to be serious. The Wayback Machine contains a very large proportion of all sites. It does happen sometime that a Wayback Machine archive is missing or broken and cqcounter has the screenshot. But the Wayback Machine is still the most complete database we have found so far. Some archives are very broken. But those are rare.
The only problem with the Wayback Machine is that there is no known efficient way to query its archives across domains. You have to have a domain in hand for CDX queries: Wayback Machine CDX scanning.
The Common Crawl project attempts in part to address this lack of querriability, but we haven't managed to extract any hits from it.
CDX + 2013 DNS Census + heuristics however has been fruitful however.
We have dumped all Wayback Machine archives of known websites to: github.com/cirosantilli/cia-2010-websites-dump using ../cia-2010-covert-communication-websites/download-websites.sh. This allows for better grepping and serves as a backup in case they ever go down.
CIA 2010 covert communication websites Wayback Machine CDX scanning by
Ciro Santilli 37 Updated 2025-07-16
The Wayback Machine has an endpoint to query cralwed pages called the CDX server. It is documented at: github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md.
This allows to filter down 10 thousands of possible domains in a few hours. But 100s of thousands would be too much. This is because you have to query exactly one URL at a time, and they possibly rate limit IPs. But no IP blacklisting so far after several hours, so it's not that bad.
Once you have a heuristic to narrow down some domains, you can use this helper: ../cia-2010-covert-communication-websites/cdx.sh to drill them down from 10s of thousands down to hundreds or thousands.
We then post process the results of cdx.sh with ../cia-2010-covert-communication-websites/cdx-post.sh to drill them down from from thousands to dozens, and manually inspect everything.
From then on, you can just manually inspect for hist on your browser.
CIA 2010 covert communication websites Wayback Machine CDX scanning with Tor parallelization by
Ciro Santilli 37 Updated 2025-07-16
Dire times require dire methods: ../cia-2010-covert-communication-websites/cdx-tor.sh.
First we must start the tor servers with the and then use it on a newline separated domain name list to check;This creates a directory
tor-army command from: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983tor-army 100./cdx-tor.sh infile.txtinfile.txt.cdx/ containing:infile.txt.cdx/out00,out01, etc.: the suspected CDX lines from domains from each tor instance based on the simple criteria that the CDX can handle directly. We split the input domains into 100 piles, and give one selected pile per tor instance.infile.txt.cdx/out: the final combined CDX output ofout00,out01, ...infile.txt.cdx/out.post: the final output containing only domain names that match further CLI criteria that cannot be easily encoded on the CDX query. This is the cleanest domain name list you should look into at the end basically.
Since archive is so abysmal in its data access, e.g. a Google BigQuery would solve our issues in seconds, we have to come up with creative ways of getting around their IP throttling.
Distilled into an answer at: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983
This should allow a full sweep of the 4.5M records in 2013 DNS Census virtual host cleanup in a reasonable amount of time. After JAR/SWF/CGI filtering we obtained 5.8k domains, so a reduction factor of about 1 million with likely very few losses. Not bad.
5.8k is still a bit annoying to fully go over however, so we can also try to count CDX hits to the domains and remove anything with too many hits, since the CIA websites basically have very few archives:This gives us something like:sorted by increasing hit counts, so we can go down as far as patience allows for!
cd 2013-dns-census-a-novirt-domains.txt.cdx
./cdx-tor.sh -d out.post domain-list.txt
cd out.post.cdx
cut -d' ' -f1 out | uniq -c | sort -k1 -n | awk 'match($2, /([^,]+),([^)]+)/, a) {printf("%s.%s %d\n", a[2], a[1], $1)}' > out.count12654montana.com 1
aeronet-news.com 1
atohms.com 1
av3net.com 1
beechstreetas400.com 1 CIA 2010 covert communication websites Wayback Machine crawl date search by
Ciro Santilli 37 Updated 2025-07-16
Main article: DNS Census 2013.
This data source was very valuable, and led to many hits, and to finding the first non Reuters ranges with Section "secure subdomain search on 2013 DNS Census".
CIA 2010 covert communication websites 2013 DNS Census virtual host cleanup by
Ciro Santilli 37 Updated 2025-07-16
We've noticed that often when there is a hit range:and that this does not seem to be that common. Let's see if that is a reasonable fingerprint or not.
- there is only one IP for each domain
- there is a range of about 20-30 of those
Note that although this is the most common case, we have found multiple hits that viewdns.info maps to the same IP.
First we create a table The
u (unique) that only have domains which are the only domain for an IP, let's see by how much that lowers the 191 M total unique domains:time sqlite3 u.sqlite 'create table t (d text, i text)'
time sqlite3 av.sqlite -cmd "attach 'u.sqlite' as u" "insert into u.t select min(d) as d, min(i) as i from t where d not like '%.%.%' group by i having count(distinct d) = 1"not like '%.%.%' removes subdomains from the counts so that CGI comms are still included, and distinct in count(distinct is because we have multiple entries at different timestamps for some of the hits.Let's start with the 208 subset to see how it goes:OK, after we fixed bugs with the above we are down to 4 million lines with unique domain/IP pairs and which contains all of the original hits! Almost certainly more are to be found!
time sqlite3 av.sqlite -cmd "attach 'u.sqlite' as u" "insert into u.t select min(d) as d, min(i) as i from t where i glob '208.*' and d not like '%.%.%' and (d like '%.com' or d like '%.net') group by i having count(distinct d) = 1"This data is so valuable that we've decided to upload it to: archive.org/details/2013-dns-census-a-novirt.csv Format:The numbers of the first column are the IPs as a 32-bit integer representation, which is more useful to search for ranges in.
8,chrisjmcgregor.com
11,80end.com
28,fine5.net
38,bestarabictv.com
49,xy005.com
50,cmsasoccer.com
80,museemontpellier.net
100,newtiger.com
108,lps-promptservice.com
111,bridesmaiddressesshow.comTo make a histogram with the distribution of the single hostname IPs:Which gives the following useless noise, there is basically no pattern:
#!/usr/bin/env bash
bin=$((2**24))
sqlite3 2013-dns-census-a-novirt.sqlite -cmd '.mode csv' >2013-dns-census-a-novirt-hist.csv <<EOF
select i, sum(cnt) from (
select floor(i/${bin}) as i,
count(*) as cnt
from t
group by 1
union
select *, 0 as cnt from generate_series(0, 255)
)
group by i
EOF
gnuplot \
-e 'set terminal svg size 1200, 800' \
-e 'set output "2013-dns-census-a-novirt-hist.svg"' \
-e 'set datafile separator ","' \
-e 'set tics scale 0' \
-e 'unset key' \
-e 'set xrange[0:255]' \
-e 'set title "Counts of IPs with a single hostname"' \
-e 'set xlabel "IPv4 first byte"' \
-e 'set ylabel "count"' \
-e 'plot "2013-dns-census-a-novirt-hist.csv" using 1:2:1 with labels' \
; CIA 2010 covert communication websites 2013 DNS Census virtual host cleanup heuristic keyword searches by
Ciro Santilli 37 Updated 2025-07-16
There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:
grep -e news -e noticias -e nouvelles -e world -e globaliran + football:
- iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.
3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:
- 66.45.179.205 noticiasporjanua.com
- 66.237.236.247 comunidaddenoticias.com
- 204.176.38.143 noticiassofisticadas.com
Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:
- 216.97.231.56 nouvelles-d-aujourdhuis.com
news + global:
- 204.176.39.115 globalprovincesnews.com
- 212.209.74.105 globalbaseballnews.com
- 212.209.79.40: hydradraco.com
OK, I've decided to do a complete Wayback Machine CDX scanning of
news... Searching for .JAR or https.*cgi-bin.*\.cgi are killers, particularly the .jar hits, here's what came out:- 62.22.60.49 telecom-headlines.com
- 62.22.61.206 worldnewsnetworking.com
- 64.16.204.55 holein1news.com
- 66.104.169.184 bcenews.com
- 69.84.156.90 stickshiftnews.com
- 74.116.72.236 techtopnews.com
- 74.254.12.168 non-stop-news.net
- 193.203.49.212 inews-today.com
- 199.85.212.118 just-kidding-news.com
- 207.210.250.132 aeronet-news.com
- 212.4.18.129 sightseeingnews.com
- 212.209.90.84 thenewseditor.com
- 216.105.98.152 modernarabicnews.com
"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.
CIA 2010 covert communication websites 2013 DNS census MX records by
Ciro Santilli 37 Updated 2025-07-16
Let' see if there's anything in records/mx.xz.
mx.csv is 21GB.
They do have
" in the files to escape commas so:mx.pyWould have been better with csvkit: stackoverflow.com/questions/36287982/bash-parse-csv-with-quotes-commas-and-newlines
import csv
import sys
writer = csv.writer(sys.stdout)
with open('mx.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
writer.writerow([row[0], row[3]])then:
# uniq not amazing as there are often two or three slightly different records repeated on multiple timestamps, but down to 11 GB
python3 mx.py | uniq > mx-uniq.csv
sqlite3 mx.sqlite 'create table t(d text, m text)'
# 13 GB
time sqlite3 mx.sqlite ".import --csv --skip 1 'mx-uniq.csv' t"
# 41 GB
time sqlite3 mx.sqlite 'create index td on t(d)'
time sqlite3 mx.sqlite 'create index tm on t(m)'
time sqlite3 mx.sqlite 'create index tdm on t(d, m)'
# Remove dupes.
# Rows: 150m
time sqlite3 mx.sqlite <<EOF
delete from t
where rowid not in (
select min(rowid)
from t
group by d, m
)
EOF
# 15 GB
time sqlite3 mx.sqlite vacuumLet's see what the hits use:
awk -F, 'NR>1{ print $2 }' ../media/cia-2010-covert-communication-websites/hits.csv | xargs -I{} sqlite3 mx.sqlite "select distinct * from t where d = '{}'"At around 267 total hits, only 84 have MX records, and from those that do, almost all of them have exactly:with only three exceptions:We need to count out of the totals!which gives, ~18M, so nope, it is too much by itself...
smtp.secureserver.net
mailstore1.secureserver.netdailynewsandsports.com|dailynewsandsports.com
inews-today.com|mail.inews-today.com
just-kidding-news.com|just-kidding-news.comsqlite3 mx.sqlite "select count(*) from t where m = 'mailstore1.secureserver.net'"Let's try to use that to reduce where
av.sqlite from 2013 DNS Census virtual host cleanup a bit further:time sqlite3 mx.sqlite '.mode csv' "attach 'aiddcu.sqlite' as 'av'" '.load ./ip' "select ipi2s(av.t.i), av.t.d from av.t inner join t as mx on av.t.d = mx.d and mx.m = 'mailstore1.secureserver.net' order by av.t.i asc" > avm.csvavm stands for av with mx pruning. This leaves us with only ~500k entries left. With one more figerprint we could do a Wayback Machine CDX scanning scan.Let's check that we still have most our hits in there:At 267 hits we got 81, so all are still present.
grep -f <(awk -F, 'NR>1{print $2}' /home/ciro/bak/git/media/cia-2010-covert-communication-websites/hits.csv) avm.csvsecureserver is a hosting provider, we can see their blank page e.g. at: web.archive.org/web/20110128152204/http://emmano.com/. security.stackexchange.com/questions/12610/why-did-secureserver-net-godaddy-access-my-gmail-account/12616#12616 comments:
secureserver.net is the name GoDaddy use as the reverse DNS for IP addresses used for dedicated/virtual server hosting
CIA 2010 covert communication websites 2013 DNS census NS records by
Ciro Santilli 37 Updated 2025-07-16
We can also cut down the data a lot with stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column/76605540#76605540 and tld filtering:This brings us down to a much more manageable 3.0 GB, 83 M rows.
awk -F, 'BEGIN{OFS=","} { if ($1 != last) { print $1, $3; last = $1; } }' ns.csv | grep -E '\.(com|net|info|org|biz),' > nsu.csvLet's just scan it once real quick to start with, since likely nothing will come of this venue:As of 267 hits we get:so yeah, most of those are likely going to be humongous just by looking at the names.
grep -f <(awk -F, 'NR>1{print $2}' ../media/cia-2010-covert-communication-websites/hits.csv) nsu.csv | tee nsu-hits.csv
cat nsu-hits.csv | csvcut -c 2 | sort | awk -F. '{OFS="."; print $(NF-1), $(NF)}' | sort | uniq -c | sort -k1 -n 1 a2hosting.com
1 amerinoc.com
1 ayns.net
1 dailyrazor.com
1 domainingdepot.com
1 easydns.com
1 frienddns.ru
1 hostgator.com
1 kolmic.com
1 name-services.com
1 namecity.com
1 netnames.net
1 tonsmovies.net
1 webmailer.de
2 cashparking.com
55 worldnic.com
86 domaincontrol.comThe smallest ones by far from the total are: frienddns.ru with only 487 hits, all others quite large or fake hits due to CSV. Did a quick Wayback Machine CDX scanning there but no luck alas.
Let's check the smaller ones:Doubt anything will come out of this.
inews-today.com,2013-08-12T03:14:01,ns1.frienddns.ru
source-commodities.net,2012-12-13T20:58:28,ns1.namecity.com -> fake hit due to grep e-commodities.net
dailynewsandsports.com,2013-08-13T08:36:28,ns3.a2hosting.com
just-kidding-news.com,2012-02-04T07:40:50,jns3.dailyrazor.com
fightwithoutrules.com,2012-11-09T01:17:40,sk.s2.ns1.ns92.kolmic.com
fightwithoutrules.com,2013-07-01T22:46:23,ns1625.ztomy.com
half-court.net,2012-09-10T09:49:15,sk.s2.ns1.ns92.kolmic.com
half-court.net,2013-07-07T00:31:12,ns1621.ztomy.com CIA 2010 covert communication websites 2013 DNS census SOA records by
Ciro Santilli 37 Updated 2025-07-16
Same as 2013 DNS census NS records basically, nothing came out.
dnshistory.org contains historical domain -> mappings.
We have not managed to extract much from this source, they don't have as much data on the range of interest.
But they do have some unique data at least, perhaps we should try them a bit more often, e.g. they were the only source we've seen so far that made the association: headlines2day.com -> 212.209.74.126 which places it in the more plausible globalbaseballnews.com IP range.
TODO can it do IP to domain? Or just domain to IP? Asked on their Discord: discord.com/channels/698151879166918727/968586102493552731/1124254204257632377. Their banner suggests that yes:
With our new look website you can now find other domains hosted on the same IP address, your website neighbours and more even quicker than before.
Owner replied, you can't:
At the moment you can only do this for current not historical records
In principle, we could obtain this data from search engines, but Google doesn't track that entire website well, e.g. no hits for
site:dnshistory.org "62.22.60.48" presumably due to heavy IP throttling.Homepage dnshistory.org/ gives date starting in 2009:and it is true that they do have some hits from that useful era.
Here at DNS History we have been crawling DNS records since 2009, our database currently contains over 1 billion domains and over 12 billion DNS records.
Any data that we have the patience of extracting from this we will dump under github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/hits.json.
They appear to piece together data from various sources. This is the most complete historical domain -> IP database we have so far. They don't have hugely more data than viewdns.info, but many times do offer something new. It feels like the key difference is that their data goes further back in the critical time period a bit.
TODO do they have historical reverse IP? The fact that they don't seem to have it suggests that they are just making historical reverse IP requests to a third party via some API?
E.g. searching
thefilmcentre.com under historical data at securitytrails.com/domain/thefilmcentre.com/history/al gives the correct IP 62.22.60.55.Account creation blacklists common email providers such as gmail to force users to use a "corporate" email address. But using random domains like
ciro@cirosantilli.com works fine.Their data seems to date back to 2008 for our searches.
So far, no new domains have been found with Common Crawl, nor have any existing known domains been found to be present in Common Crawl. Our working theory is that Common Crawl never reached the domains How did Alexa find the domains?
Let's try and do something with Common Crawl.
Unfortunately there's no IP data apparently: github.com/commoncrawl/cc-index-table/issues/30, so let's focus on the URLs.
Using their Common Crawl Athena method: commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/
Sample first output line:So
# 2
url_surtkey org,whwheelers)/robots.txt
url https://whwheelers.org/robots.txt
url_host_name whwheelers.org
url_host_tld org
url_host_2nd_last_part whwheelers
url_host_3rd_last_part
url_host_4th_last_part
url_host_5th_last_part
url_host_registry_suffix org
url_host_registered_domain whwheelers.org
url_host_private_suffix org
url_host_private_domain whwheelers.org
url_host_name_reversed
url_protocol https
url_port
url_path /robots.txt
url_query
fetch_time 2021-06-22 16:36:50.000
fetch_status 301
fetch_redirect https://www.whwheelers.org/robots.txt
content_digest 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
content_mime_type text/html
content_mime_detected text/html
content_charset
content_languages
content_truncated
warc_filename crawl-data/CC-MAIN-2021-25/segments/1623488519183.85/robotstxt/CC-MAIN-20210622155328-20210622185328-00312.warc.gz
warc_record_offset 1854030
warc_record_length 639
warc_segment 1623488519183.85
crawl CC-MAIN-2021-25
subset robotstxturl_host_3rd_last_part might be a winner for CGI comms fingerprinting!Naive one for one index:have no results... data scanned: 5.73 GB
select * from "ccindex"."ccindex" where url_host_registered_domain = 'conquermstoday.com' limit 100;Let's see if they have any of the domain hits. Let's also restrict by date to try and reduce the data scanned:Humm, data scanned: 60.59 GB and no hits... weird.
select * from "ccindex"."ccindex" where
fetch_time < TIMESTAMP '2014-01-01 00:00:00' AND
url_host_registered_domain IN (
'activegaminginfo.com',
'altworldnews.com',
...
'topbillingsite.com',
'worldwildlifeadventure.com'
)Sanity check:has a bunch of hits of course. Data scanned: 212.88 MB,
select * from "ccindex"."ccindex" WHERE
crawl = 'CC-MAIN-2013-20' AND
subset = 'warc' AND
url_host_registered_domain IN (
'google.com',
'amazon.com'
)WHERE crawl and subset are a must! Should have read the article first.Let's widen a bit more:Still nothing found... they don't seem to have any of the URLs of interest?
select * from "ccindex"."ccindex" WHERE
crawl IN (
'CC-MAIN-2013-20',
'CC-MAIN-2013-48',
'CC-MAIN-2014-10'
) AND
subset = 'warc' AND
url_host_registered_domain IN (
'activegaminginfo.com',
'altworldnews.com',
...
'worldnewsandent.com',
'worldwildlifeadventure.com'
)Does not appear to have any reverse IP hits unfortunately: opendata.stackexchange.com/questions/1951/dataset-of-domain-names/21077#21077. Likely only has domains that were explicitly advertised.
We could not find anything useful in it so far, but there is great potential to use this tool to find new IP ranges based on properties of existing IP ranges. Part of the problem is that the dataset is huge, and is split by top 256 bytes. But it would be reasonable to at least explore ranges with pre-existing known hits...
We have started looking for patterns on
66.* and 208.*, both selected as two relatively far away ranges that have a number of pre-existing hits. 208 should likely have been 212 considering later finds that put several ranges in 212.tcpip_fp:
- 66.104.
- 66.104.175.41: grubbersworldrugbynews.com: 1346397300 SCAN(V=6.01%E=4%D=1/12%OT=22%CT=443%CU=%PV=N%G=N%TM=387CAB9E%P=mipsel-openwrt-linux-gnu),ECN(R=N),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=N),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.48: worlddispatch.net: 1346816700 SCAN(V=6.01%E=4%D=1/2%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=1D5EA%P=mipsel-openwrt-linux-gnu),SEQ(SP=F8%GCD=3%ISR=109%TI=Z%TS=A),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.49: webworldsports.com: 1346692500 SCAN(V=6.01%E=4%D=9/3%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=5044E96E%P=mipsel-openwrt-linux-gnu),SEQ(SP=105%GCD=1%ISR=108%TI=Z%TS=A),OPS(O1=M550ST11NW6%O2=M550ST11NW6%O3=M550NNT11NW6%O4=M550ST11NW6%O5=M550ST11NW6%O6=M550ST11),WIN(W1=1510%W2=1510%W3=1510%W4=1510%W5=1510%W6=1510),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.50: fly-bybirdies.com: 1346822100 SCAN(V=6.01%E=4%D=1/1%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=14655%P=mipsel-openwrt-linux-gnu),SEQ(TI=Z%TS=A),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.104.175.53: info-ology.net: 1346712300 SCAN(V=6.01%E=4%D=9/4%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=50453230%P=mipsel-openwrt-linux-gnu),SEQ(SP=FB%GCD=1%ISR=FF%TI=Z%TS=A),ECN(R=N),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.175.106
- 66.175.106.150: noticiasmusica.net: 1340077500 SCAN(V=5.51%D=1/3%OT=22%CT=443%CU=%PV=N%G=N%TM=38707542%P=mipsel-openwrt-linux-gnu),ECN(R=N),T1(R=N),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
- 66.175.106.155: atomworldnews.com: 1345562100 SCAN(V=5.51%D=8/21%OT=22%CT=443%CU=%PV=N%DC=I%G=N%TM=5033A5F2%P=mips-openwrt-linux-gnu),SEQ(SP=FB%GCD=1%ISR=FC%TI=Z%TS=A),ECN(R=Y%DF=Y%TG=40%W=1540%O=M550NNSNW6%CC=N%Q=),T1(R=Y%DF=Y%TG=40%S=O%A=S+%F=AS%RD=0%Q=),T2(R=N),T3(R=N),T4(R=N),T5(R=Y%DF=Y%TG=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=),T6(R=N),T7(R=N),U1(R=N),IE(R=N)
CIA 2010 covert communication websites 2012 Internet Census hostprobes by
Ciro Santilli 37 Updated 2025-07-16
Hostprobes quick look on two ranges:
208.254.40:
... similar down
208.254.40.95 1334668500 down no-response
208.254.40.95 1338270300 down no-response
208.254.40.95 1338839100 down no-response
208.254.40.95 1339361100 down no-response
208.254.40.95 1346391900 down no-response
208.254.40.96 1335806100 up unknown
208.254.40.96 1336979700 up unknown
208.254.40.96 1338840900 up unknown
208.254.40.96 1339454700 up unknown
208.254.40.96 1346778900 up echo-reply (0.34s latency).
208.254.40.96 1346838300 up echo-reply (0.30s latency).
208.254.40.97 1335840300 up unknown
208.254.40.97 1338446700 up unknown
208.254.40.97 1339334100 up unknown
208.254.40.97 1346658300 up echo-reply (0.26s latency).
... similar up
208.254.40.126 1335708900 up unknown
208.254.40.126 1338446700 up unknown
208.254.40.126 1339330500 up unknown
208.254.40.126 1346494500 up echo-reply (0.24s latency).
208.254.40.127 1335840300 up unknown
208.254.40.127 1337793300 up unknown
208.254.40.127 1338853500 up unknown
208.254.40.127 1346454900 up echo-reply (0.23s latency).
208.254.40.128 1335856500 up unknown
208.254.40.128 1338200100 down no-response
208.254.40.128 1338749100 down no-response
208.254.40.128 1339334100 down no-response
208.254.40.128 1346607900 down net-unreach
208.254.40.129 1335699900 up unknown
... similar downSuggests exactly 127 - 96 + 1 = 31 IPs.
208.254.42:
... similar down
208.254.42.191 1334522700 down no-response
208.254.42.191 1335276900 down no-response
208.254.42.191 1335784500 down no-response
208.254.42.191 1337845500 down no-response
208.254.42.191 1338752700 down no-response
208.254.42.191 1339332300 down no-response
208.254.42.191 1346499900 down net-unreach
208.254.42.192 1334668500 up unknown
208.254.42.192 1336808700 up unknown
208.254.42.192 1339334100 up unknown
208.254.42.192 1346766300 up echo-reply (0.40s latency).
208.254.42.193 1335770100 up unknown
208.254.42.193 1338444900 up unknown
208.254.42.193 1339334100 up unknown
... similar up
208.254.42.221 1346517900 up echo-reply (0.19s latency).
208.254.42.222 1335708900 up unknown
208.254.42.222 1335708900 up unknown
208.254.42.222 1338066900 up unknown
208.254.42.222 1338747300 up unknown
208.254.42.222 1346872500 up echo-reply (0.27s latency).
208.254.42.223 1335773700 up unknown
208.254.42.223 1336949100 up unknown
208.254.42.223 1338750900 up unknown
208.254.42.223 1339334100 up unknown
208.254.42.223 1346854500 up echo-reply (0.13s latency).
208.254.42.224 1335665700 down no-response
208.254.42.224 1336567500 down no-response
208.254.42.224 1338840900 down no-response
208.254.42.224 1339425900 down no-response
208.254.42.224 1346494500 down time-exceeded
... similar downSuggests exactly 223 - 192 + 1 = 31 IPs.
It does appears that long sequences of ranges are a sort of fingerprint. The question is how unique it would be.
First:This reduces us to 2 million IP rows from the total possible 16 million IPs.
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq
t=$n-up-uniq.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq tmp"
time sqlite3 $t 'create table t (i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'OK now just counting hits on fixed windows has way too many results:
sqlite3 208-up-uniq.sqlite "\
SELECT * FROM (
SELECT min(i), COUNT(*) OVER (
ORDER BY i RANGE BETWEEN 15 PRECEDING AND 15 FOLLOWING
) as c FROM t
) WHERE c > 20 and c < 30
"Let's try instead consecutive ranges of length exactly 31 instead then:271. Hmm. A bit more than we'd like...
sqlite3 208-up-uniq.sqlite <<EOF
SELECT f, t - f as c FROM (
SELECT min(i) as f, max(i) as t
FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t)
GROUP BY grp
ORDER BY i
) where c = 31
EOFAnother route is to also count the ups:
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-cnt
t=$n-up-uniq-cnt.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq-cnt tmp"
time sqlite3 $t 'create table t (cnt integer, i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select cnt as integer, str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'Let's see how many consecutives with counts:
sqlite3 208-up-uniq-cnt.sqlite <<EOF
SELECT f, t - f as c FROM (
SELECT min(i) as f, max(i) as t
FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t WHERE cnt >= 3)
GROUP BY grp
ORDER BY i
) where c > 28 and c < 32
EOFLet's check on 66:not representative at all... e.g. several convfirmed hits are down:
grep -e '66.45.179' -e '66.45.179' 6666.45.179.215 1335305700 down no-response
66.45.179.215 1337579100 down no-response
66.45.179.215 1338765300 down no-response
66.45.179.215 1340271900 down no-response
66.45.179.215 1346813100 down no-response Pinned article: Introduction to the OurBigBook Project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 3. Visual Studio Code extension installation.Figure 4. Visual Studio Code extension tree navigation.Figure 5. Web editor. You can also edit articles on the Web editor without installing anything locally.Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.Video 4. OurBigBook Visual Studio Code extension editing and navigation demo. Source. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact





