CIA 2010 covert communication websites Wayback Machine CDX scanning by
Ciro Santilli 40 Updated 2025-07-16
The Wayback Machine has an endpoint to query cralwed pages called the CDX server. It is documented at: github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md.
This allows to filter down 10 thousands of possible domains in a few hours. But 100s of thousands would be too much. This is because you have to query exactly one URL at a time, and they possibly rate limit IPs. But no IP blacklisting so far after several hours, so it's not that bad.
Once you have a heuristic to narrow down some domains, you can use this helper: ../cia-2010-covert-communication-websites/cdx.sh to drill them down from 10s of thousands down to hundreds or thousands.
We then post process the results of cdx.sh with ../cia-2010-covert-communication-websites/cdx-post.sh to drill them down from from thousands to dozens, and manually inspect everything.
From then on, you can just manually inspect for hist on your browser.
Main article: DNS Census 2013.
This data source was very valuable, and led to many hits, and to finding the first non Reuters ranges with Section "secure subdomain search on 2013 DNS Census".
They appear to piece together data from various sources. This is the most complete historical domain -> IP database we have so far. They don't have hugely more data than viewdns.info, but many times do offer something new. It feels like the key difference is that their data goes further back in the critical time period a bit.
TODO do they have historical reverse IP? The fact that they don't seem to have it suggests that they are just making historical reverse IP requests to a third party via some API?
E.g. searching
thefilmcentre.com under historical data at securitytrails.com/domain/thefilmcentre.com/history/al gives the correct IP 62.22.60.55.Account creation blacklists common email providers such as gmail to force users to use a "corporate" email address. But using random domains like
ciro@cirosantilli.com works fine.Their data seems to date back to 2008 for our searches.
CIA 2010 covert communication websites Expired domain trackers by
Ciro Santilli 40 Updated 2025-07-16
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
- hupo.com: e.g. static.hupo.com/expdomain_myadmin/2012-03-06(国际域名).txt. Heavily IP throttled. Tor hindered more than helped.Scraping script: ../cia-2010-covert-communication-websites/hupo.sh. Scraping does about 1 day every 5 minutes relatively reliably, so about 36 hours / year. Not bad.Results are stored under
tmp/humo/<day>.Check for hit overlap:The hits are very well distributed amongst days and months, at least they did a good job hiding these potential timing fingerprints. This feels very deliberately designed.grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/hupo/*There are lots of hits. The data set is very inclusive. Also we understand that it must have been obtains through means other than Web crawling, since it contains so many of the hits.Some of their files are simply missing however unfortunately, e.g. neither of the following exist:webmasterhome.cn did contain that one however: domain.webmasterhome.cn/com/2012-07-01.asp. Hmm. we might have better luck over there then?2018-11-19 is corrupt in a new and wonderful way, with a bunch of trailing zeros:ends in:wget -O hupo-2018-11-19 'http://static.hupo.com/expdomain_myadmin/2018-11-19%EF%BC%88%E5%9B%BD%E9%99%85%E5%9F%9F%E5%90%8D%EF%BC%89.txt hd hupo-2018-11-19000ffff0 74 75 64 69 65 73 2e 63 6f 6d 0d 0a 70 31 63 6f |tudies.com..p1co| 00100000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0018a5e0 00 00 00 00 00 00 00 00 00 |.........|More generally, several files contain invalid domain names with non-ASCII characters, e.g. 2013-01-02 contains365<D3>л<FA><C2><CC>.com. Domain names can only contain ASCII charters: stackoverflow.com/questions/1133424/what-are-the-valid-characters-that-can-show-up-in-a-url-host Maybe we should get rid of any such lines as noise.Some files around 2011-09-06 start with an empty line. 2014-01-15 starts with about twenty empty lines. Oh and that last one also has some trash bytes the end<B7><B5><BB><D8>. Beauty. - webmasterhome.cn: e.g. domain.webmasterhome.cn/com/2012-03-06.asp. Appears to contain the exact same data as "static.hupo.com"Also has some randomly missing dates like hupo.com, though different missing ones from hupo, so they complement each other nicely.Some of the URLs are broken and don't inform that with HTTP status code, they just replace the results with some Chinese text 无法找到该页 (The requested page could not be found):Several URLs just return length 0 content, e.g.:It is not fully clear if this is a throttling mechanism, or if the data is just missing entirely.
curl -vvv http://domain.webmasterhome.cn/com/2015-10-31.asp * Trying 125.90.93.11:80... * Connected to domain.webmasterhome.cn (125.90.93.11) port 80 (#0) > GET /com/2015-10-31.asp HTTP/1.1 > Host: domain.webmasterhome.cn > User-Agent: curl/7.88.1 > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 21 Oct 2023 15:12:23 GMT < Server: Microsoft-IIS/6.0 < X-Powered-By: ASP.NET < Content-Length: 0 < Content-Type: text/html < Set-Cookie: ASPSESSIONIDCSTTTBAD=BGGPAONBOFKMMFIPMOGGHLMJ; path=/ < Cache-control: private < * Connection #0 to host domain.webmasterhome.cn left intactStarting around 2018, the IP limiting became very intense, 30 mins / 1 hour per URL, so we just gave up. Therefore, data from 2018 onwards does not contain webmasterhome.cn data.Starting from2013-05-10the format changes randomly. This also shows us that they just have all the HTML pages as static files on their server. E.g. with:we see:grep -a '<pre' * | s2013-05-09:<pre style='font-family:Verdana, Arial, Helvetica, sans-serif; '><strong>2013<C4><EA>05<D4><C2>09<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong><br>0-3y.com 2013-05-10:<pre><strong>2013<C4><EA>05<D4><C2>10<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong> - justdropped.com: e.g. www.justdropped.com/drops/010112com.html. First known working day:
2006-01-01. Unthrottled. - yoid.com: e.g.: yoid.com/bydate.php?d=2016-06-03&a=a. First known workding day:
2016-06-01.
Data comparison:
- 2012-01-01Looking only at the
.com:The lists are quite similar however.- webmastercn has just about ten extra ones than justdropped, the rest is exactly the same
- justdropped has some extra and some missing from hupo
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
./justdropped.sh &
wait
./justdropped-post.sh
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.shCount unique domains in the repos:
( echo */*/*/* | xargs cat ) | sort -u | wcThe extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
- archive.org/details/expired-domain-names-by-day
- github.com/cirosantilli/expired-domain-names-by-day-* repos:
- github.com/cirosantilli/expired-domain-names-by-day-2006
- github.com/cirosantilli/expired-domain-names-by-day-2007
- github.com/cirosantilli/expired-domain-names-by-day-2008
- github.com/cirosantilli/expired-domain-names-by-day-2009
- github.com/cirosantilli/expired-domain-names-by-day-2010
- github.com/cirosantilli/expired-domain-names-by-day-2011 (~11M)
- github.com/cirosantilli/expired-domain-names-by-day-2012 (~18M)
- github.com/cirosantilli/expired-domain-names-by-day-2013 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2014 (~29M)
- github.com/cirosantilli/expired-domain-names-by-day-2015 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2016
- github.com/cirosantilli/expired-domain-names-by-day-2017
- github.com/cirosantilli/expired-domain-names-by-day-2018
- github.com/cirosantilli/expired-domain-names-by-day-2019
- github.com/cirosantilli/expired-domain-names-by-day-2020
- github.com/cirosantilli/expired-domain-names-by-day-2021
- github.com/cirosantilli/expired-domain-names-by-day-2022
- github.com/cirosantilli/expired-domain-names-by-day-2023
- github.com/cirosantilli/expired-domain-names-by-day-2024
- GitHub trackers:
- admin-monitor.shiyue.com
- anquan.didichuxing.com
- app.cloudsek.com
- app.flare.io
- app.rainforest.tech
- app.shadowmap.com
- bo.serenety.xmco.fr 8 1
- bts.linecorp.com
- burn2give.vercel.app
- cbs.ctm360.com 17 2
- code6.d1m.cn
- code6-ops.juzifenqi.com
- codefend.devops.cndatacom.com
- dlp-code.airudder.com
- easm.atrust.sangfor.com
- ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
- ecall.beygoo.me 2 1
- eos.vip.vip.com 1 1
- foradar.baimaohui.net 2 1
- fty.beygoo.me
- hive.telefonica.com.br 2 1
- hulrud.tistory.com
- kartos.enthec.com
- soc.futuoa.com
- lullar-com-3.appspot.com
- penetration.houtai.io 2 1
- platform.sec.corp.qihoo.net
- plus.k8s.onemt.co 4 1
- pmp.beygoo.me 2 1
- portal.protectorg.com
- qa-boss.amh-group.com
- saicmotor.saas.cubesec.cn
- scan.huoban.com
- sec.welab-inc.com
- security.ctrip.com 10 3
- siem-gs.int.black-unique.com 2 1
- soc-github.daojia-inc.com
- spigotmc.org 2 1
- tcallzgroup.blueliv.com
- tcthreatcompass05.blueliv.com 4 1
- tix.testsite.woa.com 2 1
- toucan.belcy.com 1 1
- turbo.gwmdevops.com 18 2
- urlscan.watcherlab.com
- zelenka.guru. Looks like a Russian hacker forum.
- LinkedIn profile views:
- "Information Security Specialist at Forcepoint"
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper ../cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":produces per-year results for the regex term OK lets:
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019news|global between the years under:tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'Other searches that are not dense enough for our patience:
world|global|[^.]infoOMG and a few more. It's amazing.
news search might be producing some golden, golden new hits!!! Going full into this. Hits:- thepyramidnews.com
- echessnews.com
- tickettonews.com
- airuafricanews.com
- vuvuzelanews.com
- dayenews.com
- newsupdatesite.com
- arabicnewsonline.com
- arabicnewsunfiltered.com
- newsandsportscentral.com
- networkofnews.com
- trekkingtoday.com
- financial-crisis-news.com
We've come across a few shallow and stylistically similar websites on suspicious ranges with this pattern.
No JS/JAR/SWF comms, but rather a subdomain, and an HTTPS page with .cgi extension that leads to a login page. Some names seen for this subdomain:
The question is, is this part of some legitimate tooling that created such patterns? And if so which? Or are they actual hits with a new comms mechanism not previously seen?
The fact that:suggests to Ciro that they are an actual hit.
- hits of this type are so dense in the suspicious ranges
- they are so stylistically similar between on another
- citizenlabs specifically mentioned a "CGI" comms method
In particular, the
secure and ssl ones are overused, and together with some heuristics allowed us to find our first two non Reuters ranges! Section "secure subdomain search on 2013 DNS Census"Some currently known URLsIf we could do a crawl search for
- backstage.musical-fortune.net/cgi-bin/backstage.cgi
- clients.smart-travel-consultant.com/cgi-bin/clients.cgi
- members.it-proonline.com/cgi-bin/members.cgi
- members.metanewsdaily.com/cgi-bin/ABC.cgi
- miembros.todosperuahora.com/cgi-bin/business.cgi
- secure.altworldnews.com/cgi-bin/desk.cgi
- secure.driversinternationalgolf.com/cgi-bin/drivers.cgi
- secure.freshtechonline.com/cgi-bin/tech.cgi
- secure.globalnewsbulletin.com/cgi-bin/index.cgi
- secure.negativeaperture.com/cgi-bin/canon.cgi
- secure.riskandrewardnews.com/cgi-bin/worldwide.cgi
- secure.theworld-news.net/cgi-bin/news.cgi
- secure.topbillingsite.com/cgi-bin/main.cgi
- secure.worldnewsandent.com/cgi-bin/news.cgi
- ssl.beyondnetworknews.com/cgi-bin/local.cgi
- ssl.newtechfrontier.com/cgi-bin/tech.cgi
- www.businessexchangetoday.com/cgi-bin/business.cgi
- heal.conquermstoday.com (path unknown)
secure.*com/cgi-bin/*.cgi that might be a good enough fingerprint, maybe even *.*com/cgi-bin/*.cgi. Edit: it is not perfect, but we kind of did it: Section "secure subdomain search on 2013 DNS Census".Later on, we've also come across some stylistic hits in IP ranges with apparent slight variations of the CGI comms pattern:
Since these are so rare, it is still a bit hard to classify them for sure, but they are of great interest no doubt, as as we start to notice these patterns more tend to come if it is a thing.
There are two types of JavaScript found so far. The ones with SHA and the ones without. There are only 2 examples of JS with SHA:Both files start with precisely the same string:
- iraniangoals.com: web.archive.org/web/20110202091909/http://iraniangoals.com/journal.js Commented at: iraniangoals.com JavaScript reverse engineering
- iranfootballsource.com: web.archive.org/web/20110202091901/http://iranfootballsource.com/futbol.js
- kukrinews.com: web.archive.org/web/20100513094909/http://kukrinews.com/news.js
- todaysnewsandweather-ru.com: web.archive.org/web/20110207094735/http://todaysnewsandweather-ru.com/blacksea.js
var ms="\u062F\u0631\u064A\u0627\u0641\u062A\u06CC",lc="\u062A\u0647\u064A\u0647 \u0645\u062A\u0646",mn="\u0628\u0631\u062F\u0627\u0632\u0634 \u062F\u0631 \u062C\u0631\u064A\u0627\u0646 \u0627\u0633\u062A...\u0644\u0637\u0641\u0627 \u0635\u0628\u0631 \u0643\u0646\u064A\u062F",lt="\u062A\u0647\u064A\u0647 \u0645\u062A\u0646",ne="\u067E\u0627\u0633\u062E",kf="\u062E\u0631\u0648\u062C",mb="\u062D\u0630\u0641",mv="\u062F\u0631\u064A\u0627\u0641\u062A\u06CC",nt="\u0627\u0631\u0633\u0627\u0644",ig="\u062B\u0628\u062A \u063A\u0644\u0637. \u062C\u0647\u062A \u062A\u062C\u062F\u064A\u062F \u062B\u0628\u062A \u0635\u0641\u062D\u0647 \u0631\u0627 \u0628\u0627\u0632\u0622\u0648\u0631\u06CC \u06A9\u0646\u064A\u062F",hs="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",ji="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",ie="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",gc="\u0633\u0648\u0627\u0631 \u06A9\u0631\u062F\u0646 \u062A\u06A9\u0645\u064A\u0644 \u0634\u062F",gz="\u0645\u0637\u0645\u0626\u0646\u064A\u062F \u06A9\u0647 \u0645\u064A\u062E\u0648\u0627\u0647\u064A\u062F \u067E\u064A\u0627\u0645 \u0631\u0627 \u062D\u0630\u0641 \u06A9\u0646\u064A\u062F\u061F"Good fingerprint present in all of them:
throw new Error("B64 D.1");};if(at[1]==-1){throw new Error("B64 D.2");};if(at[2]==-1){if(f<ay.length){throw new Error("B64 D.3");};dg=2;}else if(at[3]==-1){if(f<ay.length){throw new Error("B64 D.4")Edit: Carson was found Oleg Shakirov's findingsby Oleg Shakirov:
alljohnny.com, communicated at: twitter.com/shakirov2036/status/1746729471778988499, earliest archive from 2004 (!): web.archive.org/web/20040113025122/http://alljohnny.com/, The domain was hidden in plain sight, it was present in a not very visible watermark visible in the Reuters article screenshot! The watermark was added to the CIA to the background image, it is actually present on the website. In retrospect, it was actually present at on the expired domain trackers dataset, but the mega discrete all second word made Ciro Santilli miss it: github.com/cirosantilli/expired-domain-names-by-day-2015/blob/9d504f3b85364a64f7db93311e70011344cff788/07/05/02#L15722004 Wayback Machine archive of alljohnny.com
. What follows is the previous
The fact that the Reuters article has a screenshot of it, and therefore a Wayback Machine link, plus the specificity of the website topic, will likely keep Ciro awake at night for a while until someone finds that domain.
Some text visible on the Reuters screenshot:It is unclear however if this text is plaintext or part of a an image.
Johnny Carson and The Tonight Show
Your Favorite Host and Comedic Genius
Submit Your Favorite Carson Moment
Heeere's Johnny!
Holy crap, the "Here's Johnny" line from The Shining (1980) is a reference to Johnny Carson: www.youtube.com/watch?v=WDpipB4yehk, www.youtube.com/watch?v=aYnyPAkgyvc, Ciro never knew that... but every American would have understood it at the time.
Some failed attempts, either dry guesses or from DNS grepping dataset searches:
- johnnycarson.com: official
- johnnycarson.net: fan site: web.archive.org/web/20010501225614/http://johnnycarson.net/
- johnnycarsontonight.com
- carson-johnny.com: legit
- johnnycarsonshow.com: web.archive.org/web/20110208005558/http://johnnycarsonshow.com/captcha/index.php?d=johnnycarsonshow.com your IP has been blocked
- tributetojohnnycarson.com: only one archive web.archive.org/web/20180805132430/http://tributetojohnnycarson.com/
- bestofjohnnycarson.com: web.archive.org/web/20130525035938/http://bestofjohnnycarson.com/ Lived past 2013.
- bestofjohnny.com/: web.archive.org/web/20130506011824/http://bestofjohnny.com/ empty
- johnnycarsonvideo.com: dead early 2000s web.archive.org/web/20130605152818/http://johnnycarsonvideo.com/
- johnnycarsontv.com: web.archive.org/web/20230000000000*/johnnycarsontv.com
- thejohnnycarsonshow.com: web.archive.org/web/20230000000000*/thejohnnycarsonshow.com
- carsonsbest.com: web.archive.org/web/20230000000000*/carsonsbest.com
- johnnycarsonfans.com: web.archive.org/web/20230000000000*/johnnycarsonfans.com
- web.archive.org/web/20230000000000*/carsonified.com
- night:
- amazing:
- johnnyamazing.com: broken archives: web.archive.org/web/*/http://johnnyamazing.com/*
- carson
- johnneycarson.com: no archives
- johnnycarson.co: no archives
- johnnycarsons.info
- johnnycarsons.com
- johnnycarson.org
- johnnycarsonsdesk.com
- johnny-carson-video.com
- johnnycarsondvd.org
- johnnycarsondvds.org
- johnnycarsondvd.net
- johnnycarsondvd.tv
- johnnycarsondvds.net
- johnnycarsondvds.tv
- johnnycarson.tv
- johnnyguitarcarson.com
- johnnycarsonmovie.com
- hookedonjohnnycarson.com
- johnnycarsonbook.com
- licensingjohnnycarson.com
- johnnnycarson.com
- johnnycarson360.com
- koalajohnnycarson.com
- johnny-carson.com
- johnnycarsonbirthplace.com
- johnnycarsonbirthplace.net
- johnny:
- heres:
- heresjohnnyfilm.com: web.archive.org/web/20131011115733/http://www.heresjohnnyfilm.com/ legit
- hereisjohnny.net: no archives
- heresjohnnyradioshow.com: web.archive.org/web/20130509042107/http://heresjohnnyradioshow.com/, Legit most likely: web.archive.org/web/20140517103512/http://heresjohnnyradioshow.com/
- wherejohnnylives.net: broken archives
- heresjohnny.com: squat web.archive.org/web/20130607145841/http://heresjohnny.com/ Many other TlD like .net, .co.uk
- heeeeresjohnny.com: web.archive.org/web/20130612211448/http://heeeeresjohnny.com/: legit
- night:
- johnnylatenight.com: web.archive.org/web/20150801132622/http://johnnylatenight.com/ Legit broken
- web.archive.org/web/20110208161513/http://www.johnnysnight.com/
- heres:
- johnnycarson.org: squatted past 2013, nothing before
- carsonshow.com: squat: web.archive.org/web/20110224211714/http://carsonshow.com/
- tonightshow247.net: web.archive.org/web/20101226190209/http://tonightshow247.net/: squat
- tonightshow.tv: web.archive.org/web/20141221222442/http://www.tonightshow.tv/: legit
Searching the Wayback Machine proved fruitless. There is no full text search: Wayback Machine full text search, and a heuristic web.archive.org/web/20230000000000*/Johnny%20Carson search has relevant hits but not the one we want.
Another attempt was to search for "carson" on webmasterhome.cn which lists expired domains in bulk by expiration day, and it search engine friendly. It contains most of the domains we've found so far. Google either doesn't support partial word search or requires you to be a God to find it
so we settle for DuckDuckGo which supports it: duckduckgo.com/?q=site%3Awebmasterhome.cn+%22carson%22&t=h_&ia=web Adding years also helps: duckduckgo.com/?q=site%3Awebmasterhome.cn+%22carson%22+2011&ia=web with this we might be getting all possible results. Ciro went through all in 2011, 2012 and 2013 but no luck. Also fuck en.wikipedia.org/wiki/Carson_City,_Nevada and en.wikipedia.org/wiki/Carson,_California :-)
Let's search tools.whoisxmlapi.com/reverse-whois-search for "carson" contained in any historic domain name. 10,001 lines. Grepping those, no good Wayback machine hits for those that also contain "johnny" or "show". Data at: raw.githubusercontent.com/cirosantilli/media/master/cia-2010-covert-communication-websites/tools.whoisxmlapi.com_reverse-whois-search_carson.csv in case anyone want to try and dig...
One of the most beautiful things in mathematics are theorems of conjectures that are very simple to state and understand (e.g. for K-12, lower undergrad levels), but extremely hard to prove.
This is in contrast to conjectures in certain areas where you'd have to study for a few months just to precisely understand all the definitions and the interest of the problem statement.
Bibliography:
- mathoverflow.net/questions/75698/examples-of-seemingly-elementary-problems-that-are-hard-to-solve
- www.reddit.com/r/mathematics/comments/klev7b/whats_your_favorite_easy_to_state_and_understand/
- mathoverflow.net/questions/42512/awfully-sophisticated-proof-for-simple-facts this one is for proofs for which simpler proofs exist
- math.stackexchange.com/questions/415365/it-looks-straightforward-but-actually-it-isnt this one is for "there is some reason it looks easy", whatever that means
Examples:
- classification of finite simple groups
- classification of regular polytopes
- classification of closed surfaces, and more generalized generalized Poincaré conjectures
- classification of associative real division algebras
- classification of finite fields
- classification of simple Lie groups
- classification of the wallpaper groups and the space groups
Here is a more understandable description of the semi-satire that follows: math.stackexchange.com/questions/53969/what-does-formal-mean/3297537#3297537
You start with a very small list of:
- certain arbitrarily chosen initial strings, which mathematicians call "axioms"
- rules of how to obtain new strings from old strings, called "rules of inference" Every transformation rule is very simple, and can be verified by a computer.
Using those rules, you choose a target string that you want to reach, and then try to reach it. Before the target string is reached, mathematicians call it a "conjecture".
Since every step of the proof is very simple and can be verified by a computer automatically, the entire proof can also be automatically verified by a computer very easily.
Finding proofs however is undoubtedly an uncomputable problem.
Most mathematicians can't code or deal with the real world in general however, so they haven't created the obviously necessary: website front-end for a mathematical formal proof system.
The fact that Mathematics happens to be the best way to describe physics and that humans can use physical intuition heuristics to reach the NP-hard proofs of mathematics is one of the great miracles of the universe.
Once we have mathematics formally modelled, one of the coolest results is Gödel's incompleteness theorems, which states that for any reasonable proof system, there are necessarily theorems that cannot be proven neither true nor false starting from any given set of axioms: those theorems are independent from those axioms. Therefore, there are three possible outcomes for any hypothesis: true, false or independent!
Some famous theorems have even been proven to be independent of some famous axioms. One of the most notable is that the Continuum Hypothesis is independent from Zermelo-Fraenkel set theory! Such independence proofs rely on modelling the proof system inside another proof system, and forcing is one of the main techniques used for this.
The landscape of modern Mathematics comic by Abstruse Goose
. Source. This comic shows that Mathematics is one of the most diversified areas of useless human knowledge.It seems to implement Zermelo-Fraenkel set theory.
Described at: arxiv.org/pdf/2107.12475.pdf where a relation to the Busy beaver scale is proven, and the intuitive relation to the Collatz conjecture described. Perhaps more directly: demonstrations.wolfram.com/CollatzSequenceComputedByATuringMachine/
But for infinity, things are messier, e.g. the size of the real numbers is strictly larger than the size of the integers as shown by Cantor's diagonal argument, which is kind of what justifies a fancier word "cardinality" to distinguish it from the more normal word "size".
The key idea is to compare set sizes with bijections.
Pinned article: Introduction to the OurBigBook Project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 3. Visual Studio Code extension installation.Figure 4. Visual Studio Code extension tree navigation.Figure 5. Web editor. You can also edit articles on the Web editor without installing anything locally.Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.Video 4. OurBigBook Visual Studio Code extension editing and navigation demo. Source. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact







