Being Brazilian, Ciro Santilli was particularly curious about the existence of a Brazil-focused mentioned in the Reuters article, as well as in other democracies.
WTF the CIA was doing in Brazil in the early 2010s! Wasn't helping to install the Military dictatorship in Brazil enough!
It is worth noting that democracies represent just a small minority of the websites found. The Middle East, and Spanish language sites (presumably for Venezuela + war on drugs countries?) were the huge majority. But Americans have to understand that democracies have to work together and build mutual trust, and not spy on one another. Even some of the enlightened people from Hacker News seem to not grasp this point. The USA cannot single handedly maintain world order as it once could. Collaboration based on trust is the only way.
Snowden's 2013 revelations particularly shocked USA "allies" with the fact that they were being spied upon, and as of the 2020's, everybody knows this and has "stopped caring", and or moved to end-to-end encryption by default. This is beautifully illustrated in the 2016 film "Snowden" when Snowden talks about his time in Japan working for Dell as an undercover NSA operative:
NSA wanted to impress the Japanese. Show them our reach. They loved the live video from drones. This is Pakistan right now [video shows American agents demonstrating drone footage to Japanese officials]. They were not as excited about that we wanted their help to spy on the Japanese population. They said it was against their laws.
Of course we tapped the entire country anyway.
And we did not stop there. Once we owned their communications systems, we started going after the physical infrastructure.
We'd slip these little sleeper programs into power grids, dams, hospitals. The idea was that if the day came when Japan was no longer an ally, it would be "lights out".
And it wasn't just the Japanese. We were planting malware in Mexico, Germany, Brazil, Austria.
I mean, China, I can understand. Russia. Iran. Venezuela, okay.
But Austria?!
[shows footage of cow on an idyllic Alpine mountain grazing field, suggesting that there is nothing in Austria to spy on]
Video 1.
But Austria?! scene from Snowden (2016)
Source.
Another noteworthy scene from that movie is Video 2. "Aptitude test on communication networks scene from the 2016 Snowden film", where a bunch of new CIA recruits are told that:
Each of you is going to build a covert communications network in your home city [i.e. their fictitious foreign target location written on each person's desk such as Berlin, Istanbul and Bangkok, not necessarily where they were actually born], you're going to deploy it, backup your site, destroy it, and restore it again.
thus somewhat mirroring what actually happened with these real world websites.
Video 2.
Aptitude test on communication networks scene from the 2016 Snowden film
. Source.
Craig Venter by Ciro Santilli 37 Updated 2025-07-16
One of the biotechnology superstars of the 2000's/2010's.
First we must start the tor servers with the tor-army command from: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983
tor-army 100
and then use it on a newline separated domain name list to check;
./cdx-tor.sh infile.txt
This creates a directory infile.txt.cdx/ containing:
  • infile.txt.cdx/out00, out01, etc.: the suspected CDX lines from domains from each tor instance based on the simple criteria that the CDX can handle directly. We split the input domains into 100 piles, and give one selected pile per tor instance.
  • infile.txt.cdx/out: the final combined CDX output of out00, out01, ...
  • infile.txt.cdx/out.post: the final output containing only domain names that match further CLI criteria that cannot be easily encoded on the CDX query. This is the cleanest domain name list you should look into at the end basically.
Since archive is so abysmal in its data access, e.g. a Google BigQuery would solve our issues in seconds, we have to come up with creative ways of getting around their IP throttling.
The CIA doesn't play fair. They're actually the exact opposite of fair. So neither shall we.
This should allow a full sweep of the 4.5M records in 2013 DNS Census virtual host cleanup in a reasonable amount of time. After JAR/SWF/CGI filtering we obtained 5.8k domains, so a reduction factor of about 1 million with likely very few losses. Not bad.
5.8k is still a bit annoying to fully go over however, so we can also try to count CDX hits to the domains and remove anything with too many hits, since the CIA websites basically have very few archives:
cd 2013-dns-census-a-novirt-domains.txt.cdx
./cdx-tor.sh -d out.post domain-list.txt
cd out.post.cdx
cut -d' ' -f1 out | uniq -c | sort -k1 -n | awk 'match($2, /([^,]+),([^)]+)/, a) {printf("%s.%s %d\n", a[2], a[1], $1)}' > out.count
This gives us something like:
12654montana.com 1
aeronet-news.com 1
atohms.com 1
av3net.com 1
beechstreetas400.com 1
sorted by increasing hit counts, so we can go down as far as patience allows for!
New results from a full CDX scan of 2013-dns-census-a-novirt.csv:
  • 219.90.61.123 journeystravelled.com
Many hits appear to happen on the same days, and per-day data does exist: archive.org/details/widecrawl but apparently cannot be publicly downloaded unfortunately. But maybe there's another way? TODO select candidates.
torchvision by Ciro Santilli 37 Updated 2025-07-16
Contains several computer vision models, e.g. ResNet, all of them including pre-trained versions on some dataset, which is quite sweet.
ns.csv is 57 GB. This file is too massive, working with it is a pain.
We can also cut down the data a lot with stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column/76605540#76605540 and tld filtering:
awk -F, 'BEGIN{OFS=","} { if ($1 != last) { print $1, $3; last = $1; } }' ns.csv | grep -E '\.(com|net|info|org|biz),' > nsu.csv
This brings us down to a much more manageable 3.0 GB, 83 M rows.
Let's just scan it once real quick to start with, since likely nothing will come of this venue:
grep -f <(awk -F, 'NR>1{print $2}' ../media/cia-2010-covert-communication-websites/hits.csv) nsu.csv | tee nsu-hits.csv
cat nsu-hits.csv | csvcut -c 2 | sort | awk -F. '{OFS="."; print $(NF-1), $(NF)}' | sort | uniq -c | sort -k1 -n
As of 267 hits we get:
      1 a2hosting.com
      1 amerinoc.com
      1 ayns.net
      1 dailyrazor.com
      1 domainingdepot.com
      1 easydns.com
      1 frienddns.ru
      1 hostgator.com
      1 kolmic.com
      1 name-services.com
      1 namecity.com
      1 netnames.net
      1 tonsmovies.net
      1 webmailer.de
      2 cashparking.com
     55 worldnic.com
     86 domaincontrol.com
so yeah, most of those are likely going to be humongous just by looking at the names.
The smallest ones by far from the total are: frienddns.ru with only 487 hits, all others quite large or fake hits due to CSV. Did a quick Wayback Machine CDX scanning there but no luck alas.
Let's check the smaller ones:
inews-today.com,2013-08-12T03:14:01,ns1.frienddns.ru
source-commodities.net,2012-12-13T20:58:28,ns1.namecity.com -> fake hit due to grep e-commodities.net
dailynewsandsports.com,2013-08-13T08:36:28,ns3.a2hosting.com
just-kidding-news.com,2012-02-04T07:40:50,jns3.dailyrazor.com
fightwithoutrules.com,2012-11-09T01:17:40,sk.s2.ns1.ns92.kolmic.com
fightwithoutrules.com,2013-07-01T22:46:23,ns1625.ztomy.com
half-court.net,2012-09-10T09:49:15,sk.s2.ns1.ns92.kolmic.com
half-court.net,2013-07-07T00:31:12,ns1621.ztomy.com
Doubt anything will come out of this.
Let's do a bit of counting out of the total:
grep domaincontrol.com ns.csv | awk -F, '{print $1}' | uniq | wc
gives ~20M domain using domaincontrol. Let's see how many domains are in the first place:
awk -F, '{print $1}' ns.csv | uniq | wc
so it accounts for 1/4 of the total.
Hostprobes quick look on two ranges:
208.254.40:
... similar down

208.254.40.95	1334668500	down	no-response
208.254.40.95	1338270300	down	no-response
208.254.40.95	1338839100	down	no-response
208.254.40.95	1339361100	down	no-response
208.254.40.95	1346391900	down	no-response
208.254.40.96	1335806100	up	unknown
208.254.40.96	1336979700	up	unknown
208.254.40.96	1338840900	up	unknown
208.254.40.96	1339454700	up	unknown
208.254.40.96	1346778900	up	echo-reply (0.34s latency).
208.254.40.96	1346838300	up	echo-reply (0.30s latency).
208.254.40.97	1335840300	up	unknown
208.254.40.97	1338446700	up	unknown
208.254.40.97	1339334100	up	unknown
208.254.40.97	1346658300	up	echo-reply (0.26s latency).

... similar up

208.254.40.126	1335708900	up	unknown
208.254.40.126	1338446700	up	unknown
208.254.40.126	1339330500	up	unknown
208.254.40.126	1346494500	up	echo-reply (0.24s latency).
208.254.40.127	1335840300	up	unknown
208.254.40.127	1337793300	up	unknown
208.254.40.127	1338853500	up	unknown
208.254.40.127	1346454900	up	echo-reply (0.23s latency).

208.254.40.128	1335856500	up	unknown
208.254.40.128	1338200100	down	no-response
208.254.40.128	1338749100	down	no-response
208.254.40.128	1339334100	down	no-response
208.254.40.128	1346607900	down	net-unreach
208.254.40.129	1335699900	up	unknown

... similar down
Suggests exactly 127 - 96 + 1 = 31 IPs.
208.254.42:
... similar down

208.254.42.191	1334522700	down	no-response
208.254.42.191	1335276900	down	no-response
208.254.42.191	1335784500	down	no-response
208.254.42.191	1337845500	down	no-response
208.254.42.191	1338752700	down	no-response
208.254.42.191	1339332300	down	no-response
208.254.42.191	1346499900	down	net-unreach

208.254.42.192	1334668500	up	unknown
208.254.42.192	1336808700	up	unknown
208.254.42.192	1339334100	up	unknown
208.254.42.192	1346766300	up	echo-reply (0.40s latency).
208.254.42.193	1335770100	up	unknown
208.254.42.193	1338444900	up	unknown
208.254.42.193	1339334100	up	unknown

... similar up

208.254.42.221	1346517900	up	echo-reply (0.19s latency).
208.254.42.222	1335708900	up	unknown
208.254.42.222	1335708900	up	unknown
208.254.42.222	1338066900	up	unknown
208.254.42.222	1338747300	up	unknown
208.254.42.222	1346872500	up	echo-reply (0.27s latency).
208.254.42.223	1335773700	up	unknown
208.254.42.223	1336949100	up	unknown
208.254.42.223	1338750900	up	unknown
208.254.42.223	1339334100	up	unknown
208.254.42.223	1346854500	up	echo-reply (0.13s latency).

208.254.42.224	1335665700	down	no-response
208.254.42.224	1336567500	down	no-response
208.254.42.224	1338840900	down	no-response
208.254.42.224	1339425900	down	no-response
208.254.42.224	1346494500	down	time-exceeded

... similar down
Suggests exactly 223 - 192 + 1 = 31 IPs.
Let's have a look at the file 68: outcome: no clear hits like on 208. One wonders why.
It does appears that long sequences of ranges are a sort of fingerprint. The question is how unique it would be.
First:
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq
t=$n-up-uniq.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq tmp"
time sqlite3 $t 'create table t (i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'
This reduces us to 2 million IP rows from the total possible 16 million IPs.
OK now just counting hits on fixed windows has way too many results:
sqlite3 208-up-uniq.sqlite "\
SELECT * FROM (
  SELECT min(i), COUNT(*) OVER (
    ORDER BY i RANGE BETWEEN 15 PRECEDING AND 15 FOLLOWING
  ) as c FROM t
) WHERE c > 20 and c < 30
"
Let's try instead consecutive ranges of length exactly 31 instead then:
sqlite3 208-up-uniq.sqlite <<EOF
SELECT f, t - f as c FROM (
  SELECT min(i) as f, max(i) as t
  FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t)
  GROUP BY grp
  ORDER BY i
) where c = 31
EOF
271. Hmm. A bit more than we'd like...
Another route is to also count the ups:
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-cnt
t=$n-up-uniq-cnt.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq-cnt tmp"
time sqlite3 $t 'create table t (cnt integer, i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select cnt as integer, str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'
Let's see how many consecutives with counts:
sqlite3 208-up-uniq-cnt.sqlite <<EOF
SELECT f, t - f as c FROM (
  SELECT min(i) as f, max(i) as t
  FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t WHERE cnt >= 3)
  GROUP BY grp
  ORDER BY i
) where c > 28 and c < 32
EOF
Let's check on 66:
grep -e '66.45.179' -e '66.45.179' 66
not representative at all... e.g. several convfirmed hits are down:
66.45.179.215   1335305700      down    no-response
66.45.179.215   1337579100      down    no-response
66.45.179.215   1338765300      down    no-response
66.45.179.215   1340271900      down    no-response
66.45.179.215   1346813100      down    no-response
Mastodon (software) by Ciro Santilli 37 Updated 2025-07-16
Of course those racist Nazis are a bunch of idiots, but how can you be surprised when freedom-of-speech focused tech gets used by them? www.theverge.com/2019/7/12/20691957/mastodon-decentralized-social-network-gab-migration-fediverse-app-blocking
Obviously, a few large instances dominate the user base for all practical purposes: kevq.uk/centralisation-and-mastodon/. And likely the network splits into hate-speech/non-hate-speech blacklist boundaries. And since the dominating closed networks will never lose user counts (???), the only instance that dominates will be the main hate speech one.
The flagship instance was mastodon.social and then in 2020 they closed signups for it and created a secondary mastodon.online.
The multi-instance thing is confusing and problematic. E.g. replies and likes might not show up across instances in some occasions: www.reddit.com/r/Mastodon/comments/ynt7ta/is_there_a_way_to_see_post_replies_displayed_on/
The "advanced interface" feature is bad. Really bad. MacOS file browser inspired.
Let's check relevancy of known hits:
grep -e '208.254.40' -e '208.254.42' 208 | tee 208hits
Output:
208.254.40.95	1355564700	unreachable
208.254.40.95	1355622300	unreachable
208.254.40.96	1334537100	alive, 36342
208.254.40.96	1335269700	alive, 17586

..

208.254.40.127	1355562900	alive, 35023
208.254.40.127	1355593500	alive, 59866
208.254.40.128	1334609100	unreachable
208.254.40.128	1334708100	alive from 208.254.32.214, 43358
208.254.40.128	1336596300	unreachable
The rest of 208 is mostly unreachable.
208.254.42.191	1335294900	unreachable
...
208.254.42.191	1344737700	unreachable
208.254.42.191	1345574700	Icmp Error: 0,ICMP Network Unreachable, from 63.111.123.26
208.254.42.191	1346166900	unreachable
...
208.254.42.191	1355665500	unreachable
208.254.42.192	1334625300	alive, 6672
...
208.254.42.192	1355658300	alive, 57412
208.254.42.193	1334677500	alive, 28985
208.254.42.193	1336524300	unreachable
208.254.42.193	1344447900	alive, 8934
208.254.42.193	1344613500	alive, 24037
208.254.42.193	1344806100	alive, 20410
208.254.42.193	1345162500	alive, 10177
...
208.254.42.223	1336590900	alive, 23284
...
208.254.42.223	1355555700	alive, 58841
208.254.42.224	1334607300	Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224	1334681100	Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224	1336563900	Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224	1344451500	Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.138
208.254.42.224	1344566700	unreachable
208.254.42.224	1344762900	unreachable
Let's try with 66. First there way too much data, 9 GB, let's cut it down:
n=66
time awk '$3~/^alive,/ { print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-c
OK down to 45 MB, now we can work.
grep -e '66.45.179' -e '66.104.169' -e '66.104.173' -e '66.104.175' -e '66.175.106' '66-alive-uniq-c' | tee 66hits
Nah, it's full of holes:
4,66.45.179.187
12,66.45.179.188
2,66.45.179.197
1,66.45.179.202
2,66.45.179.205
2,66.45.179.206
1,66.45.179.207
won't be able to find new ranges here.
Domain list only, no IPs and no dates. We haven't been able to extract anything of interest from this source so far.
Domain hit count when we were at 69 hits: only 9, some of which had been since reused. Likely their data collection did not cover the dates of interest.
TODO what does this Chinese forum track? New registrations? Their focus seems to be domain name speculation
Some of the threads contain domain dumps. We haven't yet seen a scrapable URL pattern, but their data goes way back and did have various hits. The forum seems to have started in 2006: club.domain.cn/forum.php?mod=forumdisplay&fid=41&page=10127
club.domain.cn/forum.php?mod=viewthread&tid=241704 "【国际域名拟删除列表】2007年06月16日" is the earliest list we could find. It is an expired domain list.
Some hits:
There are four main types of communication mechanisms found:
  • There is also one known instance where a .zip extension was used! web.archive.org/web/20131101104829*/http://plugged-into-news.net/weatherbug.zip as:
    <applet codebase="/web/20101229222144oe_/http://plugged-into-news.net/" archive="/web/20101229222144oe_/http://plugged-into-news.net/weatherbug.zip"
    JAR is the most common comms, and one of the most distinctive, making it a great fingerprint.
    Several of the JAR files are named something like either:
    as if to pose as Internet speed testing tools? The wonderful subtleties of the late 2000s Internet are a bit over our heads.
    All JARs are directly under root, not in subdirectories, and the basename usually consist of one word, though sometimes two camel cased.
  • JavaScript file. There are two subtypes:
    • JavaScript with SHAs. Rare. Likely older. Way more fingerprintable.
    • JavaScript without SHAs. They have all been obfuscated slightly different and compressed. But the file sizes are all very similar from 8kB to 10kB, and they all look similar, so visually it is very easy to detect a match with good likelyhood.
  • Adobe Flash swf file. In all instances found so far, the name of the SWF matches the name of the second level domain exactly, e.g.:
    http://tee-shot.net/tee-shot.swf
    While this is somewhat of a fingerprint, it is worth noting that is was a relatively commonly used pattern. But it is also the rarest of the mechanisms. This is a at a dissonance with the rest of the web, which circa 2010 already had way more SWF than JAR apparently.
    Some of the SWF websites have archives for empty /servlet pages:
    ./bailsnboots.com/20110201234509/servlet/teammate/index.html
    ./currentcommunique.com/20110130162713/servlet/summer/index.html
    ./mynepalnews.com/20110204095758/servlet/SnoopServlet/index.html
    ./mynepalnews.com/20110204095403/servlet/release/index.html
    ./www.hassannews.net/20101230175421/servlet/jordan/index.html
    ./zerosandonesnews.com/20110209084339/servlet/technews/index.html
    which makes us think that it is a part of the SWF system.
  • CGI comms
These have short single word names with some meaning linked to their website.
Because the communication mechanisms are so crucial, they tend to be less varied, and serve as very good fingerprints. It is not ludicrous, e.g. identical files, but one look at a few and you will know the others.
Edward Snowden by Ciro Santilli 37 Updated 2025-07-16
Figure 1.
Edward Snowden in 2013
. Source. From the film Prism, during interview with reporter Glenn Greenwald.
Video 1.
Edward Snowden original interview cut by The Guardian (2013)
Source.

Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
We have two killer features:
  1. topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
    Articles of different users are sorted by upvote within each article page. This feature is a bit like:
    • a Wikipedia where each user can have their own version of each article
    • a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
    This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
    Figure 1.
    Screenshot of the "Derivative" topic page
    . View it live at: ourbigbook.com/go/topic/derivative
  2. local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
    This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
    Figure 5. . You can also edit articles on the Web editor without installing anything locally.
    Video 3.
    Edit locally and publish demo
    . Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
  3. https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png
  4. Infinitely deep tables of contents:
    Figure 6.
    Dynamic article tree with infinitely deep table of contents
    .
    Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact