JAR, SWF and CGI-bin scanning by path only is fine, since there are relatively few of those. But .js scanning by path only is too broad.
One option would be to filter out by size, an information that is contained on the CDX. Let's check typical ones:
grep -f <(jq -r '.[]|select(select(.comms)|.comms|test("\\.js"))|.host' ../media/cia-2010-covert-communication-websites/hits.json) out | out.jshits.cdx
sort -n -k7 out.jshits.cdx
Ignoring some obvious unrelated non-comms files visually we get a range of about 2732 to 3632:
net,hollywoodscreen)/current.js 20110106082232 http://hollywoodscreen.net/current.js text/javascript 200 XY5NHVW7UMFS3WSKPXLOQ5DJA34POXMV 2732
com,amishkanews)/amishkanewss.js 20110208032713 http://amishkanews.com/amishkanewss.js text/javascript 200 S5ZWJ53JFSLUSJVXBBA3NBJXNYLNCI4E 3632
This ignores the obviously atypical JavaScript with SHAs from iranfootballsource, and the particularly small old menu.js from cutabovenews.com, which we embed into ../cia-2010-covert-communication-websites/cdx-post-js.sh.
The size helps a bit, but it's not insanely good unfortunately, only about 3x, these are some common JS sizes right there!
There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:
grep -e news -e noticias -e nouvelles -e world -e global
iran + football:
  • iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.
3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:
  • 66.45.179.205 noticiasporjanua.com
  • 66.237.236.247 comunidaddenoticias.com
  • 204.176.38.143 noticiassofisticadas.com
Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:
  • 216.97.231.56 nouvelles-d-aujourdhuis.com
news + world:
  • 210.80.75.55 philippinenewsonline.net
news + global:
  • 204.176.39.115 globalprovincesnews.com
  • 212.209.74.105 globalbaseballnews.com
  • 212.209.79.40: hydradraco.com
OK, I've decided to do a complete Wayback Machine CDX scanning of news... Searching for .JAR or https.*cgi-bin.*\.cgi are killers, particularly the .jar hits, here's what came out:
  • 62.22.60.49 telecom-headlines.com
  • 62.22.61.206 worldnewsnetworking.com
  • 64.16.204.55 holein1news.com
  • 66.104.169.184 bcenews.com
  • 69.84.156.90 stickshiftnews.com
  • 74.116.72.236 techtopnews.com
  • 74.254.12.168 non-stop-news.net
  • 193.203.49.212 inews-today.com
  • 199.85.212.118 just-kidding-news.com
  • 207.210.250.132 aeronet-news.com
  • 212.4.18.129 sightseeingnews.com
  • 212.209.90.84 thenewseditor.com
  • 216.105.98.152 modernarabicnews.com
Wayback Machine CDX scanning of "world":
  • 66.104.173.186 myworldlymusic.com
"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.
"today": only 3.5k matches in 2013-dns-census-a-novirt.csv and 12 hits out of 269 hits, TODO how many on those on 2013-dns-census-a-novirt? No new hits.
"world", "global", "international", and spanish/portuguese/French versions like "mondo", "mundo", "mondi": 15k matches in 2013-dns-census-a-novirt.csv. No new hits.
Let' see if there's anything in records/mx.xz.
mx.csv is 21GB.
They do have " in the files to escape commas so:
mx.py
import csv
import sys
writer = csv.writer(sys.stdout)
with open('mx.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        writer.writerow([row[0], row[3]])
Would have been better with csvkit: stackoverflow.com/questions/36287982/bash-parse-csv-with-quotes-commas-and-newlines
then:
# uniq not amazing as there are often two or three slightly different records repeated on multiple timestamps, but down to 11 GB
python3 mx.py | uniq > mx-uniq.csv
sqlite3 mx.sqlite 'create table t(d text, m text)'
# 13 GB
time sqlite3 mx.sqlite ".import --csv --skip 1 'mx-uniq.csv' t"

# 41 GB
time sqlite3 mx.sqlite 'create index td on t(d)'
time sqlite3 mx.sqlite 'create index tm on t(m)'
time sqlite3 mx.sqlite 'create index tdm on t(d, m)'

# Remove dupes.
# Rows: 150m
time sqlite3 mx.sqlite <<EOF
delete from t
where rowid not in (
  select min(rowid)
  from t
  group by d, m
)
EOF

# 15 GB
time sqlite3 mx.sqlite vacuum
Let's see what the hits use:
awk -F, 'NR>1{ print $2 }' ../media/cia-2010-covert-communication-websites/hits.csv | xargs -I{} sqlite3 mx.sqlite "select distinct * from t where d = '{}'"
At around 267 total hits, only 84 have MX records, and from those that do, almost all of them have exactly:
smtp.secureserver.net
mailstore1.secureserver.net
with only three exceptions:
dailynewsandsports.com|dailynewsandsports.com
inews-today.com|mail.inews-today.com
just-kidding-news.com|just-kidding-news.com
We need to count out of the totals!
sqlite3 mx.sqlite "select count(*) from t where m = 'mailstore1.secureserver.net'"
which gives, ~18M, so nope, it is too much by itself...
Let's try to use that to reduce av.sqlite from 2013 DNS Census virtual host cleanup a bit further:
time sqlite3 mx.sqlite '.mode csv' "attach 'aiddcu.sqlite' as 'av'" '.load ./ip' "select ipi2s(av.t.i), av.t.d from av.t inner join t as mx on av.t.d = mx.d and mx.m = 'mailstore1.secureserver.net' order by av.t.i asc" > avm.csv
where avm stands for av with mx pruning. This leaves us with only ~500k entries left. With one more figerprint we could do a Wayback Machine CDX scanning scan.
Let's check that we still have most our hits in there:
grep -f <(awk -F, 'NR>1{print $2}' /home/ciro/bak/git/media/cia-2010-covert-communication-websites/hits.csv) avm.csv
At 267 hits we got 81, so all are still present.
secureserver is a hosting provider, we can see their blank page e.g. at: web.archive.org/web/20110128152204/http://emmano.com/. security.stackexchange.com/questions/12610/why-did-secureserver-net-godaddy-access-my-gmail-account/12616#12616 comments:
secureserver.net is the name GoDaddy use as the reverse DNS for IP addresses used for dedicated/virtual server hosting
Fish by Ciro Santilli 40 Updated 2025-07-16
This paraphyletic subgroup is easy to form the "acquatic only" (fishes) vs "things that come out of water" (tetrapods). Though mudfishes make that distinction harder.
Which kind of makes sense, why would you want for limbs unless you are going to stay out of water!
History of science by Ciro Santilli 40 Updated 2025-07-16
If there is one thing that makes Ciro Santilli learn German, this is it (the Romance language are all the same, so reading them is basically covered for Ciro already).
Human loss of fur by Ciro Santilli 40 Updated 2025-07-16
Video 1.
How Humans Lost Their Fur by PBS Eons (2020)
Source. Says it is linked to bipedalism to help hunting in hot weather. But could only happen fully after the invention of fire, otherwise you'd be too cold at night.
Section type: sh_type == SHT_SYMTAB.
Common name: "symbol table".
First the we note that:
  • sh_link = 5
  • sh_info = 6
For SHT_SYMTAB sections, those numbers mean that:
  • strings that give symbol names are in section 5, .strtab
  • the relocation data is in section 6, .rela.text
A good high level tool to disassemble that section is:
nm hello_world.o
which gives:
0000000000000000 T _start
0000000000000000 d hello_world
000000000000000d a hello_world_len
This is however a high level view that omits some types of symbols and in which the symbol types . A more detailed disassembly can be obtained with:
readelf -s hello_world.o
which gives:
Symbol table '.symtab' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello_world.asm
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    1 hello_world
     5: 000000000000000d     0 NOTYPE  LOCAL  DEFAULT  ABS hello_world_len
     6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    2 _start
The binary format of the table is documented at www.sco.com/developers/gabi/2003-12-17/ch4.symtab.html
The data is:
readelf -x .symtab hello_world.o
which gives:
Hex dump of section '.symtab':
  0x00000000 00000000 00000000 00000000 00000000 ................
  0x00000010 00000000 00000000 01000000 0400f1ff ................
  0x00000020 00000000 00000000 00000000 00000000 ................
  0x00000030 00000000 03000100 00000000 00000000 ................
  0x00000040 00000000 00000000 00000000 03000200 ................
  0x00000050 00000000 00000000 00000000 00000000 ................
  0x00000060 11000000 00000100 00000000 00000000 ................
  0x00000070 00000000 00000000 1d000000 0000f1ff ................
  0x00000080 0d000000 00000000 00000000 00000000 ................
  0x00000090 2d000000 10000200 00000000 00000000 -...............
  0x000000a0 00000000 00000000                   ........
The entries are of type:
typedef struct {
    Elf64_Word  st_name;
    unsigned char   st_info;
    unsigned char   st_other;
    Elf64_Half  st_shndx;
    Elf64_Addr  st_value;
    Elf64_Xword st_size;
} Elf64_Sym;
Like in the section table, the first entry is magical and set to a fixed meaningless values.

Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
We have two killer features:
  1. topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
    Articles of different users are sorted by upvote within each article page. This feature is a bit like:
    • a Wikipedia where each user can have their own version of each article
    • a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
    This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
    Figure 1.
    Screenshot of the "Derivative" topic page
    . View it live at: ourbigbook.com/go/topic/derivative
  2. local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
    This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
    Figure 2.
    You can publish local OurBigBook lightweight markup files to either https://OurBigBook.com or as a static website
    .
    Figure 3.
    Visual Studio Code extension installation
    .
    Figure 4.
    Visual Studio Code extension tree navigation
    .
    Figure 5.
    Web editor
    . You can also edit articles on the Web editor without installing anything locally.
    Video 3.
    Edit locally and publish demo
    . Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
    Video 4.
    OurBigBook Visual Studio Code extension editing and navigation demo
    . Source.
  3. https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png
  4. Infinitely deep tables of contents:
    Figure 6.
    Dynamic article tree with infinitely deep table of contents
    .
    Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact