CIA 2010 covert communication websites Wayback Machine CDX scanning with Tor parallelization by
Ciro Santilli 40 Updated 2025-07-16
Dire times require dire methods: ../cia-2010-covert-communication-websites/cdx-tor.sh.
First we must start the tor servers with the and then use it on a newline separated domain name list to check;This creates a directory
tor-army command from: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983tor-army 100./cdx-tor.sh infile.txtinfile.txt.cdx/ containing:infile.txt.cdx/out00,out01, etc.: the suspected CDX lines from domains from each tor instance based on the simple criteria that the CDX can handle directly. We split the input domains into 100 piles, and give one selected pile per tor instance.infile.txt.cdx/out: the final combined CDX output ofout00,out01, ...infile.txt.cdx/out.post: the final output containing only domain names that match further CLI criteria that cannot be easily encoded on the CDX query. This is the cleanest domain name list you should look into at the end basically.
Since archive is so abysmal in its data access, e.g. a Google BigQuery would solve our issues in seconds, we have to come up with creative ways of getting around their IP throttling.
Distilled into an answer at: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983
This should allow a full sweep of the 4.5M records in 2013 DNS Census virtual host cleanup in a reasonable amount of time. After JAR/SWF/CGI filtering we obtained 5.8k domains, so a reduction factor of about 1 million with likely very few losses. Not bad.
5.8k is still a bit annoying to fully go over however, so we can also try to count CDX hits to the domains and remove anything with too many hits, since the CIA websites basically have very few archives:This gives us something like:sorted by increasing hit counts, so we can go down as far as patience allows for!
cd 2013-dns-census-a-novirt-domains.txt.cdx
./cdx-tor.sh -d out.post domain-list.txt
cd out.post.cdx
cut -d' ' -f1 out | uniq -c | sort -k1 -n | awk 'match($2, /([^,]+),([^)]+)/, a) {printf("%s.%s %d\n", a[2], a[1], $1)}' > out.count12654montana.com 1
aeronet-news.com 1
atohms.com 1
av3net.com 1
beechstreetas400.com 1 CIA 2010 covert communication websites Wayback Machine crawl date search by
Ciro Santilli 40 Updated 2025-07-16
Their historic DNS and reverse DNS info was very valuable, and served as Ciro's the initial entry point to finding hits in the IP ranges given by Reuters.
Generic information about the website not specific on this project will be stored at: Section "viewdns.info".
Since this source is so scarce and valuable, we have been quite careful to note down all the domain and IP ranges that have been explored.
At news.ycombinator.com/item?id=38496244, the creator of the viewdns.info, "Hughesey", also stated that he'd able to give some free credits for public research projects such as this one. This would have saved up going to quite a few Cafes to get those sweet extra IPs! But it was more fun in hardmode, no doubt.
We do API access to IP ranges with this simple helper: ../cia-2010-covert-communication-websites/viewdns-info.sh, usage:e.g.:
./viewdns-info.sh <apikey> <start-ipv-address> <end-ipv-address>./viewdns-info.sh 8b890b00b17ed2d66bbed878d51200b58d43d014 66.45.179.187 66.45.179.210For domain to IP queries from the API you should use "iphistory" viewdns.info/api/docs/ip-history.php:
curl 'https://api.viewdns.info/iphistory/?domain=todaysengineering.com&apikey=$APIKEY&output=json'Just beware of the viewdns.info reverse IP bug, that really sucks and led to us missing a ton of domains.
Main article: DNS Census 2013.
This data source was very valuable, and led to many hits, and to finding the first non Reuters ranges with Section "secure subdomain search on 2013 DNS Census".
CIA 2010 covert communication websites 2013 DNS Census virtual host cleanup by
Ciro Santilli 40 Updated 2025-07-16
We've noticed that often when there is a hit range:and that this does not seem to be that common. Let's see if that is a reasonable fingerprint or not.
- there is only one IP for each domain
- there is a range of about 20-30 of those
Note that although this is the most common case, we have found multiple hits that viewdns.info maps to the same IP.
First we create a table The
u (unique) that only have domains which are the only domain for an IP, let's see by how much that lowers the 191 M total unique domains:time sqlite3 u.sqlite 'create table t (d text, i text)'
time sqlite3 av.sqlite -cmd "attach 'u.sqlite' as u" "insert into u.t select min(d) as d, min(i) as i from t where d not like '%.%.%' group by i having count(distinct d) = 1"not like '%.%.%' removes subdomains from the counts so that CGI comms are still included, and distinct in count(distinct is because we have multiple entries at different timestamps for some of the hits.Let's start with the 208 subset to see how it goes:OK, after we fixed bugs with the above we are down to 4 million lines with unique domain/IP pairs and which contains all of the original hits! Almost certainly more are to be found!
time sqlite3 av.sqlite -cmd "attach 'u.sqlite' as u" "insert into u.t select min(d) as d, min(i) as i from t where i glob '208.*' and d not like '%.%.%' and (d like '%.com' or d like '%.net') group by i having count(distinct d) = 1"This data is so valuable that we've decided to upload it to: archive.org/details/2013-dns-census-a-novirt.csv Format:The numbers of the first column are the IPs as a 32-bit integer representation, which is more useful to search for ranges in.
8,chrisjmcgregor.com
11,80end.com
28,fine5.net
38,bestarabictv.com
49,xy005.com
50,cmsasoccer.com
80,museemontpellier.net
100,newtiger.com
108,lps-promptservice.com
111,bridesmaiddressesshow.comTo make a histogram with the distribution of the single hostname IPs:Which gives the following useless noise, there is basically no pattern:
#!/usr/bin/env bash
bin=$((2**24))
sqlite3 2013-dns-census-a-novirt.sqlite -cmd '.mode csv' >2013-dns-census-a-novirt-hist.csv <<EOF
select i, sum(cnt) from (
select floor(i/${bin}) as i,
count(*) as cnt
from t
group by 1
union
select *, 0 as cnt from generate_series(0, 255)
)
group by i
EOF
gnuplot \
-e 'set terminal svg size 1200, 800' \
-e 'set output "2013-dns-census-a-novirt-hist.svg"' \
-e 'set datafile separator ","' \
-e 'set tics scale 0' \
-e 'unset key' \
-e 'set xrange[0:255]' \
-e 'set title "Counts of IPs with a single hostname"' \
-e 'set xlabel "IPv4 first byte"' \
-e 'set ylabel "count"' \
-e 'plot "2013-dns-census-a-novirt-hist.csv" using 1:2:1 with labels' \
; CIA 2010 covert communication websites 2013 DNS Census virtual host cleanup heuristic keyword searches by
Ciro Santilli 40 Updated 2025-07-16
There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:
grep -e news -e noticias -e nouvelles -e world -e globaliran + football:
- iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.
3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:
- 66.45.179.205 noticiasporjanua.com
- 66.237.236.247 comunidaddenoticias.com
- 204.176.38.143 noticiassofisticadas.com
Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:
- 216.97.231.56 nouvelles-d-aujourdhuis.com
news + global:
- 204.176.39.115 globalprovincesnews.com
- 212.209.74.105 globalbaseballnews.com
- 212.209.79.40: hydradraco.com
OK, I've decided to do a complete Wayback Machine CDX scanning of
news... Searching for .JAR or https.*cgi-bin.*\.cgi are killers, particularly the .jar hits, here's what came out:- 62.22.60.49 telecom-headlines.com
- 62.22.61.206 worldnewsnetworking.com
- 64.16.204.55 holein1news.com
- 66.104.169.184 bcenews.com
- 69.84.156.90 stickshiftnews.com
- 74.116.72.236 techtopnews.com
- 74.254.12.168 non-stop-news.net
- 193.203.49.212 inews-today.com
- 199.85.212.118 just-kidding-news.com
- 207.210.250.132 aeronet-news.com
- 212.4.18.129 sightseeingnews.com
- 212.209.90.84 thenewseditor.com
- 216.105.98.152 modernarabicnews.com
"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.
Ciro was even more stupid than as of 2020, and continued to try and hang out with those evil kids to show them he was cool too or that he was strong, and so continued to get hurt.
Advice to his children: stay away from evil people.
The bullied sometimes feels an almost masochistic desire to overcome the bullies' contempt, and to try and either become friends with the bullies, or to overpower them.
You must never give into those thoughts.
If you come across evil people, smile a fake smile to them, and walk away, but never give your back to them, and always be ready to fight.
If they laugh at you, know that you are shit like everyone else, pretend to laugh with them, take their post and repost it on your public profile, and silently stay away from those idiots.
Never show any weakness.
If a fight is likely, always be ready, always have your friends nearby, be as well armed as the enemy, and never be outnumbered.
On the Internet, never care about e-bully posts, either block them immediately, and anyone that likes their posts, or follow Ciro's reply policy.
Call parents or other authorities as soon as there is risk of physical harm. Better a living free pussy than dead or in youth detention for murder. Similar advice applies if you are going to jail I guess.
If a physical fight is inevitable however, ignore Jesus this once and don't give the other face, but rather follow the Talmud and fight all out on the beaches:References:
If someone comes to kill you, rise and kill first.
The Sikh knife, the Kirpan, which Sikhs must carry at all times as a religious obligation, also comes to mind. The Sikh must have been bullied out of the their minds at some point in history, Ciro understands.
Non-violence only works when you have bodies to spare from your followers.
Perhaps it was good to learn those lessons early, before the stakes were too high. Adults fake it much better, and therefore it is harder to learn those lessons from them, but they are still just as evil on the inside.
These experiences might have contributed to Ciro Santilli's self perceived compassionate personality.
It basically came about because of the endless stream of useless software startups made since the 2000's by one or two people with no investments with the continued increase in computers and Internet speeds until the great wall was reached.
Deep tech means not one of those. More specifically, it means technologies that require significant investment in expensive materials and laboratory equipment to progress, such as molecular biology technologies and quantum computing.
And it basically comes down to technologies that wrestle with the fundamental laws of physics rather than software data wrangling.
Computers are of course limited by the laws of physics, but those are much hidden by several layers of indirection.
Full visibility, and full control, make computer tasks be tasks that eventually always work out more or less as expected.
The same does not hold true when real Physics is involved.
Physics is brutal.
To start with, you can't even see your system very clearly, and often doing so requires altering its behaviour.
For example, in molecular biology, most great discoveries are made after some new technique is made to be able to observe smaller things.
But you often have to kill your cells to make those observations, which makes it very hard to understand how they work dynamically.
What we would really want would be to track every single protein as it goes about inside the cell. But that is likely an impossible dream.
The same for the brain. If we had observations of every neuron, how long would it take to understand it? Not long, people are really good at reverse engineering things when there is enough information available to do so, see also science is the reverse engineering of nature.
Then, even when you start to see the system, you might have a very hard time controlling it, because it is so fragile. This is basically the case of quantum computing in 2020.
The next big things will come from deep tech. Failure is always a possibility, and you can't know before you try.
But that's also why its so fun to dare.
Stuff that Ciro Santilli considers "deep tech" as of 2020:
- brain-computer interface
- fusion power. The question there is, when is "deep", "too deep"?
Applications of power, we have to remember it is there to notice how awesome it is!
- lightning
- motors
- sending nad receiving communication signals
- computers, which in turn can do computations and improved communication
Most promising approaches as of 2020:
Why Private Billions Are Flowing Into Fusion by Bloomberg (2022)
Source. - Joint European Torus
- General Fusion: compress with liquid metal. Intends to demo in JET site.
- Helion Energy: direct fusion to electricity conversion without steam, direct from magnetic field movements
- First Light: shootjobs.lever.co/proximafusion/23aab9a8-34ec-40d2-bb14-440f1130021c microscopic objct at a target to crush it so much that fusion happens
Once again, relies on superconductivity to reach insane magnetic fields. Superconductivity is just so important.
Ciro Santilli saw a good presentation about it once circa 2020, it seems that the main difficulty of the time was turbulence messing things up. They have some nice simulations with cross section pictures e.g. at: www.eurekalert.org/news-releases/937941.
These are websites that offer somewhat overlapping services, many of which served inspirations, and why we think something different is needed to achieve our goals.
Notably, OurBigBook is the result of Ciro Santilli's experiences with:OurBigBook could be seen as a cross between those three websites.
- Wikipedia
- GitHub
- Stack Exchange (or as non techies might point out, Urban Dictionary, or Quora before it was such an incomprehensible shitshow)
Quick mentions:
- handwiki.org/wiki/HandWiki:About: technically the same as Wikipedia, but with more aligned moderation policies
- ecotext.co/ similar goals. Their website seems quite broken now though as of 2021, can't see text properly. Crunchbase entry: www.crunchbase.com/organization/ecotext says they are from Durham, New Hampshire, United States. Cannot see how to publish, curated material only? Twitter: twitter.com/ecotextinc?lang=en One of the founders: twitter.com/BigNel_21 | www.linkedin.com/in/ecotextnelsonthomas/. Their LinkedIn: www.linkedin.com/company/ecotext/people/
- fiveable.me/ bad: separates students and teachers, as a student I don't see where to create my content. Good: focus on teaching university level stuff to people outside of university via Advanced Placement. Bad: Lots of video content. Bad: Can't see the issue tracker attached to each page.
- LessWrong: their website system does have some similar feature sets to what we want. Reputation, Q&A sections, links between articles most likely, sort by upvote everywhere.
- crowdpub.org collaborative writing website, somehow goes to paragraph level, TODO how they reconcile different authors? Closed beta as of writing, so hard to be sure. From quick presentation on beta website, appears to attempt to share revenue to authors proportionally to the size of their contribution. Some blockchain-based reputation. Meh.
- TODO migrate all from: github.com/booktree/booktree/blob/master/alternatives.md
- studynotes.ie/. Admin approval on everything. No ToC. Fixed tag list for university entry exams topics.
- mindstone.com: there appears to be no sharing focus? File upload basesd? Not sure.
- EverybodyWiki
- looking for open source Confluence-alternatives is an interesting way to go:
- lists:
- BookStack:
- fixed 3-level page hierarchy
- writen in PHP
- Markdown support: www.bookstackapp.com/docs/user/markdown-editor/
- no source-level import-export apparently: www.bookstackapp.com/docs/admin/backup-restore/, youtu.be/WUvtzJfCAKE?t=904
- WYSIWYG: www.bookstackapp.com/docs/user/wysiwyg-editor/ via TinyMCE
- page content repeating: www.bookstackapp.com/docs/user/reusing-page-content/ (will be useful for course modelling)
- github.com/shuding/nextra converts Markdown links to Next.js links. We should look into how it works.
- zettelkasten.de/the-archive/ "The Archive" from zettelkasten.de/. Closed source. By German software engineer Christian Tietze twitter.com/ctietze?lang=en
- LLM generated wiki e.g.:
- docs.tigyog.app/cli beautiful website, but doesn't achieve much. Has a Markdown upload mechanism. Ah, those newbs who think the average user will care about markup upload to DB... Oh, wait...
- www.stuvia.com/en-gb/school/uk/oxford-university/physics. PDF uploads. In theory you have to own copyright: www.stuvia.com/en-gb/copyright/guidelines but it feels unlikely that most material was uploaded by the copyright owners. If those people are up, then why can't we? Maybe... Registred in the UK. People: some Dutch dudes:
- Project Xanadu: crazy overlaps, though that project is vaporware apparently?
Administrators of Project Xanadu have declared it superior to the World Wide Web, with the mission statement: "Today's popular software simulates paper. The World Wide Web (another imitation of paper) trivialises our original hypertext model with one-way ever-breaking links and no management of version or contents.
Static website-only alternatives:
- quarto.org/
- vitepress.dev. vitepress.dev/guide/markdown unmanaged internal links. Sample website: wiki.nikiv.dev/.
Conceptual:
- The Final Encyclopedia: science fiction concept, but the name was reused by Paul Allen in a research project
- second brain
- collective intelligence
- you don't get any/sufficient recognition for your contributions. The closest they have to upvotes and reputation is the incredibly obscure "thank" feature which is only visible to the receiver itself: en.wikipedia.org/wiki/Help:Notifications/Thanks
- deletionism is a tremendous problem on Wikipedia, for two main causes:The stuff you wrote can be deleted anytime by some random admin/opposing editor, examples at: Section "Deletionism on Wikipedia".
- tutorial-like subjectivity
- notability
- Scope too limited, and politics defined. Everything has to sound encyclopedic and be notable enough. This basically excludes completely good tutorials.
- Insane impossible to use markup language-base talk pages instead of issue trackers?! Ridiculous!!! That change alone could make Wikipedia so much more amazing. Wikipedia could become a Stack Exchange killer by doing that alone + some basic reputation system. Some work on that is being done at: www.mediawiki.org/wiki/Extension:DiscussionTools, already in Beta as of 2022.
- Edit wars
Pinned article: Introduction to the OurBigBook Project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 3. Visual Studio Code extension installation.Figure 4. Visual Studio Code extension tree navigation.Figure 5. Web editor. You can also edit articles on the Web editor without installing anything locally.Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.Video 4. OurBigBook Visual Studio Code extension editing and navigation demo. Source. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact






