The Wayback Machine has an endpoint to query cralwed pages called the CDX server. It is documented at: github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md.
This allows to filter down 10 thousands of possible domains in a few hours. But 100s of thousands would be too much. This is because you have to query exactly one URL at a time, and they possibly rate limit IPs. But no IP blacklisting so far after several hours, so it's not that bad.
Once you have a heuristic to narrow down some domains, you can use this helper: ../cia-2010-covert-communication-websites/cdx.sh to drill them down from 10s of thousands down to hundreds or thousands.
We then post process the results of cdx.sh with ../cia-2010-covert-communication-websites/cdx-post.sh to drill them down from from thousands to dozens, and manually inspect everything.
From then on, you can just manually inspect for hist on your browser.
This data source was very valuable, and led to many hits, and to finding the first non Reuters ranges with Section "secure subdomain search on 2013 DNS Census".
Hit overlap:
jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) | xargs -I{} sqlite3 aiddcu.sqlite "select * from t where d = '{}'"
Domain hit count when we were at 279 hits: 142 hits, so about half of the hits were present.
The timing of the database is perfect for this project, it is as if the CIA had planted it themselves!
They appear to piece together data from various sources. This is the most complete historical domain -> IP database we have so far. They don't have hugely more data than viewdns.info, but many times do offer something new. It feels like the key difference is that their data goes further back in the critical time period a bit.
TODO do they have historical reverse IP? The fact that they don't seem to have it suggests that they are just making historical reverse IP requests to a third party via some API?
E.g. searching thefilmcentre.com under historical data at securitytrails.com/domain/thefilmcentre.com/history/al gives the correct IP 62.22.60.55.
But searching the IP 62.22.60.55 is empty and there's no historical data option?
Account creation blacklists common email providers such as gmail to force users to use a "corporate" email address. But using random domains like ciro@cirosantilli.com works fine.
Their data seems to date back to 2008 for our searches.
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:
This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
Data comparison:
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
./justdropped.sh &
wait
./justdropped-post.sh
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.sh
Count unique domains in the repos:
( echo */*/*/* | xargs cat ) | sort -u | wc
The extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
  • GitHub trackers:
    • admin-monitor.shiyue.com
    • anquan.didichuxing.com
    • app.cloudsek.com
    • app.flare.io
    • app.rainforest.tech
    • app.shadowmap.com
    • bo.serenety.xmco.fr 8 1
    • bts.linecorp.com
    • burn2give.vercel.app
    • cbs.ctm360.com 17 2
    • code6.d1m.cn
    • code6-ops.juzifenqi.com
    • codefend.devops.cndatacom.com
    • dlp-code.airudder.com
    • easm.atrust.sangfor.com
    • ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
    • ecall.beygoo.me 2 1
    • eos.vip.vip.com 1 1
    • foradar.baimaohui.net 2 1
    • fty.beygoo.me
    • hive.telefonica.com.br 2 1
    • hulrud.tistory.com
    • kartos.enthec.com
    • soc.futuoa.com
    • lullar-com-3.appspot.com
    • penetration.houtai.io 2 1
    • platform.sec.corp.qihoo.net
    • plus.k8s.onemt.co 4 1
    • pmp.beygoo.me 2 1
    • portal.protectorg.com
    • qa-boss.amh-group.com
    • saicmotor.saas.cubesec.cn
    • scan.huoban.com
    • sec.welab-inc.com
    • security.ctrip.com 10 3
    • siem-gs.int.black-unique.com 2 1
    • soc-github.daojia-inc.com
    • spigotmc.org 2 1
    • tcallzgroup.blueliv.com
    • tcthreatcompass05.blueliv.com 4 1
    • tix.testsite.woa.com 2 1
    • toucan.belcy.com 1 1
    • turbo.gwmdevops.com 18 2
    • urlscan.watcherlab.com
    • zelenka.guru. Looks like a Russian hacker forum.
  • LinkedIn profile views:
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*
Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper ../cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019
produces per-year results for the regex term news|global between the years under:
tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012
OK lets:
./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'
Other searches that are not dense enough for our patience:
world|global|[^.]info
OMG news search might be producing some golden, golden new hits!!! Going full into this. Hits:
  • thepyramidnews.com
  • echessnews.com
  • tickettonews.com
  • airuafricanews.com
  • vuvuzelanews.com
  • dayenews.com
  • newsupdatesite.com
  • arabicnewsonline.com
  • arabicnewsunfiltered.com
  • newsandsportscentral.com
  • networkofnews.com
  • trekkingtoday.com
  • financial-crisis-news.com
and a few more. It's amazing.
We've come across a few shallow and stylistically similar websites on suspicious ranges with this pattern.
No JS/JAR/SWF comms, but rather a subdomain, and an HTTPS page with .cgi extension that leads to a login page. Some names seen for this subdomain:
  • secure.: most common
  • ssl.: also common
  • various other more creative ones linked to the website theme itself, e.g.:
    • musical-fortune.net has a backstage.musical-fortune.net
The question is, is this part of some legitimate tooling that created such patterns? And if so which? Or are they actual hits with a new comms mechanism not previously seen?
The fact that:
  • hits of this type are so dense in the suspicious ranges
  • they are so stylistically similar between on another
  • citizenlabs specifically mentioned a "CGI" comms method
suggests to Ciro that they are an actual hit.
In particular, the secure and ssl ones are overused, and together with some heuristics allowed us to find our first two non Reuters ranges! Section "secure subdomain search on 2013 DNS Census"
There are two types of JavaScript found so far. The ones with SHA and the ones without. There are only 2 examples of JS with SHA:Both files start with precisely the same string:
var ms="\u062F\u0631\u064A\u0627\u0641\u062A\u06CC",lc="\u062A\u0647\u064A\u0647 \u0645\u062A\u0646",mn="\u0628\u0631\u062F\u0627\u0632\u0634 \u062F\u0631 \u062C\u0631\u064A\u0627\u0646 \u0627\u0633\u062A...\u0644\u0637\u0641\u0627 \u0635\u0628\u0631 \u0643\u0646\u064A\u062F",lt="\u062A\u0647\u064A\u0647 \u0645\u062A\u0646",ne="\u067E\u0627\u0633\u062E",kf="\u062E\u0631\u0648\u062C",mb="\u062D\u0630\u0641",mv="\u062F\u0631\u064A\u0627\u0641\u062A\u06CC",nt="\u0627\u0631\u0633\u0627\u0644",ig="\u062B\u0628\u062A \u063A\u0644\u0637. \u062C\u0647\u062A \u062A\u062C\u062F\u064A\u062F \u062B\u0628\u062A \u0635\u0641\u062D\u0647 \u0631\u0627 \u0628\u0627\u0632\u0622\u0648\u0631\u06CC \u06A9\u0646\u064A\u062F",hs="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",ji="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",ie="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",gc="\u0633\u0648\u0627\u0631 \u06A9\u0631\u062F\u0646 \u062A\u06A9\u0645\u064A\u0644 \u0634\u062F",gz="\u0645\u0637\u0645\u0626\u0646\u064A\u062F \u06A9\u0647 \u0645\u064A\u062E\u0648\u0627\u0647\u064A\u062F \u067E\u064A\u0627\u0645 \u0631\u0627 \u062D\u0630\u0641 \u06A9\u0646\u064A\u062F\u061F"
Good fingerprint present in all of them:
throw new Error("B64 D.1");};if(at[1]==-1){throw new Error("B64 D.2");};if(at[2]==-1){if(f<ay.length){throw new Error("B64 D.3");};dg=2;}else if(at[3]==-1){if(f<ay.length){throw new Error("B64 D.4")
Edit: Carson was found Oleg Shakirov's findingsby Oleg Shakirov: alljohnny.com, communicated at: twitter.com/shakirov2036/status/1746729471778988499, earliest archive from 2004 (!): web.archive.org/web/20040113025122/http://alljohnny.com/, The domain was hidden in plain sight, it was present in a not very visible watermark visible in the Reuters article screenshot! The watermark was added to the CIA to the background image, it is actually present on the website. In retrospect, it was actually present at on the expired domain trackers dataset, but the mega discrete all second word made Ciro Santilli miss it: github.com/cirosantilli/expired-domain-names-by-day-2015/blob/9d504f3b85364a64f7db93311e70011344cff788/07/05/02#L1572
Figure 1.
2004 Wayback Machine archive of alljohnny.com
.
What follows is the previous
The fact that the Reuters article has a screenshot of it, and therefore a Wayback Machine link, plus the specificity of the website topic, will likely keep Ciro awake at night for a while until someone finds that domain.
Some text visible on the Reuters screenshot:
It is unclear however if this text is plaintext or part of a an image.
Some failed attempts, either dry guesses or from DNS grepping dataset searches:
Searching the Wayback Machine proved fruitless. There is no full text search: Wayback Machine full text search, and a heuristic web.archive.org/web/20230000000000*/Johnny%20Carson search has relevant hits but not the one we want.
Another attempt was to search for "carson" on webmasterhome.cn which lists expired domains in bulk by expiration day, and it search engine friendly. It contains most of the domains we've found so far. Google either doesn't support partial word search or requires you to be a God to find itso we settle for DuckDuckGo which supports it: duckduckgo.com/?q=site%3Awebmasterhome.cn+%22carson%22&t=h_&ia=web Adding years also helps: duckduckgo.com/?q=site%3Awebmasterhome.cn+%22carson%22+2011&ia=web with this we might be getting all possible results. Ciro went through all in 2011, 2012 and 2013 but no luck. Also fuck en.wikipedia.org/wiki/Carson_City,_Nevada and en.wikipedia.org/wiki/Carson,_California :-)
Let's search tools.whoisxmlapi.com/reverse-whois-search for "carson" contained in any historic domain name. 10,001 lines. Grepping those, no good Wayback machine hits for those that also contain "johnny" or "show". Data at: raw.githubusercontent.com/cirosantilli/media/master/cia-2010-covert-communication-websites/tools.whoisxmlapi.com_reverse-whois-search_carson.csv in case anyone want to try and dig...
Let's also search the fortuitously timed 2013 DNS Census.
One of the most beautiful things in mathematics are theorems of conjectures that are very simple to state and understand (e.g. for K-12, lower undergrad levels), but extremely hard to prove.
This is in contrast to conjectures in certain areas where you'd have to study for a few months just to precisely understand all the definitions and the interest of the problem statement.
You start with a very small list of:
Using those rules, you choose a target string that you want to reach, and then try to reach it. Before the target string is reached, mathematicians call it a "conjecture".
Mathematicians call the list of transformation rules used to reach a string a "proof".
Since every step of the proof is very simple and can be verified by a computer automatically, the entire proof can also be automatically verified by a computer very easily.
Finding proofs however is undoubtedly an uncomputable problem.
Most mathematicians can't code or deal with the real world in general however, so they haven't created the obviously necessary: website front-end for a mathematical formal proof system.
The fact that Mathematics happens to be the best way to describe physics and that humans can use physical intuition heuristics to reach the NP-hard proofs of mathematics is one of the great miracles of the universe.
Once we have mathematics formally modelled, one of the coolest results is Gödel's incompleteness theorems, which states that for any reasonable proof system, there are necessarily theorems that cannot be proven neither true nor false starting from any given set of axioms: those theorems are independent from those axioms. Therefore, there are three possible outcomes for any hypothesis: true, false or independent!
Some famous theorems have even been proven to be independent of some famous axioms. One of the most notable is that the Continuum Hypothesis is independent from Zermelo-Fraenkel set theory! Such independence proofs rely on modelling the proof system inside another proof system, and forcing is one of the main techniques used for this.
Figure 1.
The landscape of modern Mathematics comic by Abstruse Goose
. Source. This comic shows that Mathematics is one of the most diversified areas of useless human knowledge.
Cardinality by Ciro Santilli 40 Updated 2025-07-16
The size of a set.
For finite sizes, the definition is simple, and the intuitive name "size" matches well.
But for infinity, things are messier, e.g. the size of the real numbers is strictly larger than the size of the integers as shown by Cantor's diagonal argument, which is kind of what justifies a fancier word "cardinality" to distinguish it from the more normal word "size".
The key idea is to compare set sizes with bijections.

Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
We have two killer features:
  1. topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
    Articles of different users are sorted by upvote within each article page. This feature is a bit like:
    • a Wikipedia where each user can have their own version of each article
    • a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
    This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
    Figure 1.
    Screenshot of the "Derivative" topic page
    . View it live at: ourbigbook.com/go/topic/derivative
  2. local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
    This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
    Figure 2.
    You can publish local OurBigBook lightweight markup files to either https://OurBigBook.com or as a static website
    .
    Figure 3.
    Visual Studio Code extension installation
    .
    Figure 4.
    Visual Studio Code extension tree navigation
    .
    Figure 5.
    Web editor
    . You can also edit articles on the Web editor without installing anything locally.
    Video 3.
    Edit locally and publish demo
    . Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
    Video 4.
    OurBigBook Visual Studio Code extension editing and navigation demo
    . Source.
  3. https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png
  4. Infinitely deep tables of contents:
    Figure 6.
    Dynamic article tree with infinitely deep table of contents
    .
    Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact