Scientific visualization by Ciro Santilli 35 Updated +Created
Cerebras by Ciro Santilli 35 Updated +Created
For some reason they attempt to make a single chip on an entire wafer!
jq ignore missing attribute by Ciro Santilli 35 Updated +Created
echo '[{"a": 1, "b": 2}, {"b": 3}]' | jq '.[] | select(.a) | .a'
Output:
1
and no empty lines as desired.
DNS database by Ciro Santilli 35 Updated +Created
A "DNS database" is a database that stores DNS records, notably A-records, which IP a domains is hosted at.
For currently live domains, domain to IP can of course be easily determined on the fly by just resolving the domain like the browser does, e.g.
cirosantilli.com
What is hard however is:
  • the other way around is harder however: given an IP, list all domains that it hosts. This is known as "reverse IP" searching.
  • historic data, i.e. what was the IP for a given domain at a given date and vice versa
As of 2023, working with DNS data is just going through a mish-mash of closed datasets/expensive APIs.
Some links of interest:
Chart by Ciro Santilli 35 Updated +Created
Chart type by Ciro Santilli 35 Updated +Created
Google BigQuery by Ciro Santilli 35 Updated +Created
Histogram by Ciro Santilli 35 Updated +Created
Independently wealthy by Ciro Santilli 35 Updated +Created
LLM game by Ciro Santilli 35 Updated +Created
Wayback Machine CDX scanning with Tor parallelization by Ciro Santilli 35 Updated +Created
Dire times require dire methods: cia-2010-covert-communication-websites/cdx-tor.sh.
First we must start the tor servers with the tor-army command from: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983
tor-army 100
and then use it on a newline separated domain name list to check;
./cdx-tor.sh infile.txt
This creates a directory infile.txt.cdx/ containing:
  • infile.txt.cdx/out00, out01, etc.: the suspected CDX lines from domains from each tor instance based on the simple criteria that the CDX can handle directly. We split the input domains into 100 piles, and give one selected pile per tor instance.
  • infile.txt.cdx/out: the final combined CDX output of out00, out01, ...
  • infile.txt.cdx/out.post: the final output containing only domain names that match further CLI criteria that cannot be easily encoded on the CDX query. This is the cleanest domain name list you should look into at the end basically.
Since archive is so abysmal in its data access, e.g. a Google BigQuery would solve our issues in seconds, we have to come up with creative ways of getting around their IP throttling.
The CIA doesn't play fair. They're actually the exact opposite of fair. So neither shall we.
This should allow a full sweep of the 4.5M records in 2013 DNS Census virtual host cleanup in a reasonable amount of time. After JAR/SWF/CGI filtering we obtained 5.8k domains, so a reduction factor of about 1 million with likely very few losses. Not bad.
5.8k is still a bit annoying to fully go over however, so we can also try to count CDX hits to the domains and remove anything with too many hits, since the CIA websites basically have very few archives:
cd 2013-dns-census-a-novirt-domains.txt.cdx
./cdx-tor.sh -d out.post domain-list.txt
cd out.post.cdx
cut -d' ' -f1 out | uniq -c | sort -k1 -n | awk 'match($2, /([^,]+),([^)]+)/, a) {printf("%s.%s %d\n", a[2], a[1], $1)}' > out.count
This gives us something like:
12654montana.com 1
aeronet-news.com 1
atohms.com 1
av3net.com 1
beechstreetas400.com 1
sorted by increasing hit counts, so we can go down as far as patience allows for!
New results from a full CDX scan of 2013-dns-census-a-novirt.csv:
  • 219.90.61.123 journeystravelled.com
JS CDX scanning by Ciro Santilli 35 Updated +Created
JAR, SWF and CGI-bin scanning by path only is fine, since there are relatively few of those. But .js scanning by path only is too broad.
One option would be to filter out by size, an information that is contained on the CDX. Let's check typical ones:
grep -f <(jq -r '.[]|select(select(.comms)|.comms|test("\\.js"))|.host' ../media/cia-2010-covert-communication-websites/hits.json) out | out.jshits.cdx
sort -n -k7 out.jshits.cdx
Ignoring some obvious unrelated non-comms files visually we get a range of about 2732 to 3632:
net,hollywoodscreen)/current.js 20110106082232 http://hollywoodscreen.net/current.js text/javascript 200 XY5NHVW7UMFS3WSKPXLOQ5DJA34POXMV 2732
com,amishkanews)/amishkanewss.js 20110208032713 http://amishkanews.com/amishkanewss.js text/javascript 200 S5ZWJ53JFSLUSJVXBBA3NBJXNYLNCI4E 3632
This ignores the obviously atypical JavaScript with SHAs from iranfootballsource, and the particularly small old menu.js from cutabovenews.com, which we embed into cia-2010-covert-communication-websites/cdx-post-js.sh.
The size helps a bit, but it's not insanely good unfortunately, only about 3x, these are some common JS sizes right there!
Counterintelligence by Ciro Santilli 35 Updated +Created
Synagogue by Ciro Santilli 35 Updated +Created
Parlour game by Ciro Santilli 35 Updated +Created
TP Link ARCHER VR2800 by Ciro Santilli 35 Updated +Created
Twenty questions by Ciro Santilli 35 Updated +Created
Electrical resistivity and conductivity by Ciro Santilli 35 Updated +Created
State of matter by Ciro Santilli 35 Updated +Created
Alain Aspect by Ciro Santilli 35 Updated +Created
Video 1.
Alain Aspect in Quantum entanglement Documentary (1985)
Source. The moustache and broken English were already his trademarks back then!!! Also cool to get a glimpse of his lab, and good schematics of the experiment. TODO exact lab location? Documentary says in Paris, but where?

Unlisted articles are being shown, click here to show only listed articles.