bbchallenge.org/story#what-is-known-about-bb lists some (all?) cool examples,
- BB(15): Erdős' conjecture on powers of 2, which has some relation to Collatz conjecture
- BB(27): Goldbach's conjecture
- BB(744): Riemann hypothesis
- BB(748): independent from the Zermelo-Fraenkel axioms
- BB(7910): independent from the ZFC
wiki.bbchallenge.org/wiki/Cryptids contains a larger list. In June 2024 it was discovered that BB(6) is hard.
It can't be HTML crawl because presumably there wouldn't have been links to those websites? Presumably this is why Common Crawl doesn't seem to have any hits.
So they must have had some kind of DNS A record database?
Or would IPv4 sweep have worked, without the
Host
header with the CIA's setup?The same question also applies to the 2013 DNS Census. It has less hits, but still has many.
Whatever they did, we are so so glad that they did!
Apparently only Mathieu group and Mathieu group .
www.maths.qmul.ac.uk/~pjc/pps/pps9.pdf mentions:Hmm, is that 54, or more likely 5 and 4?
The automorphism group of the extended Golay code is the 54-transitive Mathieu group . This is one of only two finite 5-transitive groups other than symmetric and alternating groups
scite.ai/reports/4-homogeneous-groups-EAKY21 quotes link.springer.com/article/10.1007%2FBF01111290 which suggests that is is also another one of the Mathieu groups, math.stackexchange.com/questions/698327/classification-of-triply-transitive-finite-groups#comment7650505_3721840 and en.wikipedia.org/wiki/Mathieu_group_M12 mentions .
math.stackexchange.com/questions/700235/is-there-an-easy-proof-for-the-classification-of-6-transitive-finite-groups says there aren't any non-boring ones.
These can be viewed at bitcoinstrings.com/blk00052.txt and are mostly commented on the "Wikileaks cablegate data" section of Hidden surprises in the Bitcoin blockchain by Ken Shirriff (2014).
Soon after block 229991 uploaded the Satoshi uploader, several interesting files were added to the blockchain using the uploader, and notably some containing content that might be illegal in certain countries, as a test to see if this type of content would make the Bitcoin blockchain illegal or not:
- tx 08654f9dc9d673b3527b48ad06ab1b199ad47b61fd54033af30c2ee975c588bd block 229999 contains a leaked private key and a link to: threatpost.com/en_us/blogs/ami-firmware-source-codAe-private-key-leaked-040513
- tx b96af3b69b48a82c5eae3e44ebb6ef93f30d7764b1d5b40243e11b0d374ac1b7 block 230001 contains the link:followed presumably by one such prime starting with:The number is quoted e.g. at: www.computerforum.com/threads/illegal-prime-number.67782/
4 85650 78965 73978 29309 84189 46942 86137 70744 20873 51357 92401 96520 73668
- tx 237783998a6799264983150187a73ab6d116f2ba78d3e1f88529e95229f59d67 block 233620 contains another illegal prime starting with:
This one is quoted in a few places online in blockchain illegality discussions:49310 83597 02850 19002 75777 67239 07649 57284
- www.reddit.com/r/Bitcoin/comments/1akyy4/comment/c8yel60 "What happens if someone inserts illegal content into the block chain?" (2013-03-19)
- news.ycombinator.com/item?id=8055243 "Filecoin – Data storage network and crypto-currency based on Bitcoin" (2014-07-18)
- tx 54e48e5f5c656b26c3bca14a8c95aa583d07ebe84dde3b7dd4a78f4e4186e713 block 230009 contains the Bitcoin white paper: bitcoin.org/bitcoin.pdf More context: bitcoin.stackexchange.com/questions/35959/how-is-the-whitepaper-decoded-from-the-blockchain-tx-with-1000x-m-of-n-multisi
- tx 691dd277dc0e90a462a3d652a1171686de49cf19067cd33c7df0392833fb986a block 230203 Cablegate index. The announced filename is
cablegate-201012041811.7z
. As mentioned in Hidden surprises in the Bitcoin blockchain by Ken Shirriff (2014), it has an ASCII list of several other transactions, which presumably when downloaded with the Satoshi uploader can concatenated lead to the full 7z file. Also as mentioned by Ken, it is infinitely easier for the average user to just access the cables directly on WikiLeaks :-) The data is preceded by the message:sSEXWikileaks Cablegate Backup cablegate-201012041811.7z Download the following transactions with Satoshi Nakamoto's download tool which can be found in transaction 6c53cd987119ef797d5adccd76241247988a0a5ef783572a9972e7371c5fb0cc Free speech and free enterprise! Thank you Satoshi!
- tx dde7cd8e8f073a525c16c5ee4e4a254f847b7ad6babef257231813166fbef551 block 230229 and tx 4a0088a249e9099d205fb4760c28275d4b8965ac9fd56f5ddf6771cdb0d94f38 block 230231 contain indexes of pages from The Hidden Wiki. These can be viewed at: bitcoinstrings.com/blk00052.txt. Not reproduced here because we are cowards.
So basically, this was the first obviously illegal block attempt.
None of this content is particularly eye-popping for Ciro Santilli's slightly crazy freedom of speech standards, and as of 2021, the Bitcoin blockchain likely hasn't become illegal anywhere yet due to freedom of speech concerns.
Furthermore, it is likely much easier to find much worse illegal content by browsing any uncensored Onion service search engine for 2 minutes.
Ciro Santilli estimates that perhaps the uploader didn't upload child pornography, which is basically the apex of illegality of this era, because they were afraid that their identities would one day be found.
Bibliography:
- bitcointalk.org/index.php?topic=191039.0 "WTF - Kiddy Porn in the Blockchain for life?" (2013-04-29) on the Bitcoin Forum
Starting tx a87d406fae047258a12923b3c11a797a5765bd8f868df5c7e9b1cead0e92c9c1: the message:appears about 13 thousand times. WTF happened?
503: Bitcoin over capacity!
Some interesting usages:
Note that the images must be drawn with white on black. If you use black on white, it the accuracy becomes terrible. This is a good very example of brittleness in AI systems!
We can try the code adapted from thenewstack.io/tutorial-using-a-pre-trained-onnx-model-for-inferencing/ at python/onnx_cheat/infer_mnist.py:and it works pretty well! The protram outputs:as desired.
cd python/onnx_cheat
./infer_mnist.py lenet.onnx infer_mnist_9.png
9
We can also try with images directly from Extract MNIST images.and the accuracy is great as expected.
for f in /home/ciro/git/mnist_png/out/testing/1/*.png; do echo $f; infer.py $f ; done
Bibliography:
- wiki.archlinux.org/title/AMDGPU
- gitlab.freedesktop.org/drm/amd an issue tracker
- github.com/ROCm/ROCK-Kernel-Driver TODO vs the GitLab?
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
- hupo.com: e.g. static.hupo.com/expdomain_myadmin/2012-03-06(国际域名).txt. Heavily IP throttled. Tor hindered more than helped.Scraping script: cia-2010-covert-communication-websites/hupo.sh. Scraping does about 1 day every 5 minutes relatively reliably, so about 36 hours / year. Not bad.Results are stored under
tmp/humo/<day>
.Check for hit overlap:The hits are very well distributed amongst days and months, at least they did a good job hiding these potential timing fingerprints. This feels very deliberately designed.grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/hupo/*
There are lots of hits. The data set is very inclusive. Also we understand that it must have been obtains through means other than Web crawling, since it contains so many of the hits.Nice output format for scraping as the HTML is very minimalThey randomly changed their URL format to remove the space before the .com after 2012-02-03:Some of their files are simply missing however unfortunately, e.g. neither of the following exist:webmasterhome.cn did contain that one however: domain.webmasterhome.cn/com/2012-07-01.asp. Hmm. we might have better luck over there then?2018-11-19 is corrupt in a new and wonderful way, with a bunch of trailing zeros:ends in:wget -O hupo-2018-11-19 'http://static.hupo.com/expdomain_myadmin/2018-11-19%EF%BC%88%E5%9B%BD%E9%99%85%E5%9F%9F%E5%90%8D%EF%BC%89.txt hd hupo-2018-11-19
000ffff0 74 75 64 69 65 73 2e 63 6f 6d 0d 0a 70 31 63 6f |tudies.com..p1co| 00100000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0018a5e0 00 00 00 00 00 00 00 00 00 |.........|
More generally, several files contain invalid domain names with non-ASCII characters, e.g. 2013-01-02 contains365<D3>л<FA><C2><CC>.com
. Domain names can only contain ASCII charters: stackoverflow.com/questions/1133424/what-are-the-valid-characters-that-can-show-up-in-a-url-host Maybe we should get rid of any such lines as noise.Some files around 2011-09-06 start with an empty line. 2014-01-15 starts with about twenty empty lines. Oh and that last one also has some trash bytes the end<B7><B5><BB><D8>
. Beauty. - webmasterhome.cn: e.g. domain.webmasterhome.cn/com/2012-03-06.asp. Appears to contain the exact same data as "static.hupo.com"Also heavily IP throttled, and a bit more than hupo apparently.Also has some randomly missing dates like hupo.com, though different missing ones from hupo, so they complement each other nicely.Some of the URLs are broken and don't inform that with HTTP status code, they just replace the results with some Chinese text 无法找到该页 (The requested page could not be found):Several URLs just return length 0 content, e.g.:It is not fully clear if this is a throttling mechanism, or if the data is just missing entirely.
curl -vvv http://domain.webmasterhome.cn/com/2015-10-31.asp * Trying 125.90.93.11:80... * Connected to domain.webmasterhome.cn (125.90.93.11) port 80 (#0) > GET /com/2015-10-31.asp HTTP/1.1 > Host: domain.webmasterhome.cn > User-Agent: curl/7.88.1 > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 21 Oct 2023 15:12:23 GMT < Server: Microsoft-IIS/6.0 < X-Powered-By: ASP.NET < Content-Length: 0 < Content-Type: text/html < Set-Cookie: ASPSESSIONIDCSTTTBAD=BGGPAONBOFKMMFIPMOGGHLMJ; path=/ < Cache-control: private < * Connection #0 to host domain.webmasterhome.cn left intact
Starting around 2018, the IP limiting became very intense, 30 mins / 1 hour per URL, so we just gave up. Therefore, data from 2018 onwards does not contain webmasterhome.cn data.Starting from2013-05-10
the format changes randomly. This also shows us that they just have all the HTML pages as static files on their server. E.g. with:we see:grep -a '<pre' * | s
2013-05-09:<pre style='font-family:Verdana, Arial, Helvetica, sans-serif; '><strong>2013<C4><EA>05<D4><C2>09<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong><br>0-3y.com 2013-05-10:<pre><strong>2013<C4><EA>05<D4><C2>10<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong>
- justdropped.com: e.g. www.justdropped.com/drops/030612com.html
- yoid.com: e.g.: yoid.com/bydate.php?d=2016-06-03&a=a
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
wait
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.sh
The extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
- archive.org/details/expired-domain-names-by-day
- github.com/cirosantilli/expired-domain-names-by-day-* repos:
- github.com/cirosantilli/expired-domain-names-by-day-2011 (~11M)
- github.com/cirosantilli/expired-domain-names-by-day-2012 (~18M)
- github.com/cirosantilli/expired-domain-names-by-day-2013 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2014 (~29M)
- github.com/cirosantilli/expired-domain-names-by-day-2015 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2016
- github.com/cirosantilli/expired-domain-names-by-day-2017
- github.com/cirosantilli/expired-domain-names-by-day-2018
- github.com/cirosantilli/expired-domain-names-by-day-2019
- github.com/cirosantilli/expired-domain-names-by-day-2020
- github.com/cirosantilli/expired-domain-names-by-day-2021
- github.com/cirosantilli/expired-domain-names-by-day-2022
- GitHub trackers:
- admin-monitor.shiyue.com
- anquan.didichuxing.com
- app.cloudsek.com
- app.flare.io
- app.rainforest.tech
- app.shadowmap.com
- bo.serenety.xmco.fr 8 1
- bts.linecorp.com
- burn2give.vercel.app
- cbs.ctm360.com 17 2
- code6.d1m.cn
- code6-ops.juzifenqi.com
- codefend.devops.cndatacom.com
- dlp-code.airudder.com
- easm.atrust.sangfor.com
- ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
- ecall.beygoo.me 2 1
- eos.vip.vip.com 1 1
- foradar.baimaohui.net 2 1
- fty.beygoo.me
- hive.telefonica.com.br 2 1
- hulrud.tistory.com
- kartos.enthec.com
- soc.futuoa.com
- lullar-com-3.appspot.com
- penetration.houtai.io 2 1
- platform.sec.corp.qihoo.net
- plus.k8s.onemt.co 4 1
- pmp.beygoo.me 2 1
- portal.protectorg.com
- qa-boss.amh-group.com
- saicmotor.saas.cubesec.cn
- scan.huoban.com
- sec.welab-inc.com
- security.ctrip.com 10 3
- siem-gs.int.black-unique.com 2 1
- soc-github.daojia-inc.com
- spigotmc.org 2 1
- tcallzgroup.blueliv.com
- tcthreatcompass05.blueliv.com 4 1
- tix.testsite.woa.com 2 1
- toucan.belcy.com 1 1
- turbo.gwmdevops.com 18 2
- urlscan.watcherlab.com
- zelenka.guru. Looks like a Russian hacker forum.
- LinkedIn profile views:
- "Information Security Specialist at Forcepoint"
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*
Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":produces per-year results for the regex term OK lets:
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019
news|global
between the years under:tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012
./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'
Other searches that are not dense enough for our patience:
world|global|[^.]info
OMG and a few more. It's amazing.
news
search might be producing some golden, golden new hits!!! Going full into this. Hits:- thepyramidnews.com
- echessnews.com
- tickettonews.com
- airuafricanews.com
- vuvuzelanews.com
- dayenews.com
- newsupdatesite.com
- arabicnewsonline.com
- arabicnewsunfiltered.com
- newsandsportscentral.com
- networkofnews.com
- trekkingtoday.com
- financial-crisis-news.com
There are unlisted articles, also show them or only show them.