The old lady was arrested in 2015 for a few days for doing Falun Gong, which was an important motivation to Ciro Santilli's campaign for freedom of speech in China.
The first chapter of the New Testament.
The Final Encyclopedia (Paul Allen) by
Ciro Santilli 35 Updated 2025-04-24 +Created 1970-01-01
The Google Story Chapter 21. A Virtual Library mentions that Paul Allen was interested in trying to create something like the "Final Encyclopedia" from this book. This is somewhat the same motivation for Google Books and Google's activities more broadly, as shown in their organise the world's information mission statement.
CIA 2010 covert communication websites Wayback Machine by
Ciro Santilli 35 Updated 2025-04-24 +Created 1970-01-01
D'oh.
But to be serious. The Wayback Machine contains a very large proportion of all sites. It is the most complete database we have found so far. Some archives are very broken. But those are rares.
The only problem with the Wayback Machine is that there is no known efficient way to query its archives across domains. You have to have a domain in hand for CDX queries: Wayback Machine CDX scanning.
The Common Crawl project attempts in part to address this lack of querriability, but we haven't managed to extract any hits from it.
CDX + 2013 DNS Census + heuristics however has been fruitful however.
Ciro's Edict #5 Skip ID extraction and rendering based on database timestamps by
Ciro Santilli 35 Updated 2025-04-24 +Created 1970-01-01
This means while developing a website locally with the OurBigBook CLI, if you have a bunch of files with an error in one of them, your first run will run slowly until the error:but further runs will blast through the files that worked, skipping all files that have sucessfully converted:so you can fix file by file and move on quickly.
extract_ids README.ciro
extract_ids README.ciro finished in 73.82836899906397 ms
extract_ids art.ciro
extract_ids art.ciro finished in 671.1738419979811 ms
extract_ids ciro-santilli.ciro
extract_ids ciro-santilli.ciro finished in 1009.6256089992821 ms
extract_ids science.ciro
error: science.ciro:13686:1: named argument "parent" given multiple times
extract_ids science.ciro finished in 1649.6193730011582 ms
extract_ids README.ciro
extract_ids README.ciro skipped by timestamp
extract_ids art.ciro
extract_ids art.ciro skipped by timestamp
extract_ids ciro-santilli.ciro
extract_ids ciro-santilli.ciro skipped by timestamp
extract_ids science.ciro
More details at: cirosantilli.com/ourbigbook#no-render-timestamp
This was not fully trivial to implement because we had to rework how duplicate IDs are checked. Previously, we just nuked the DB every time on a directory conversion, and then repopulated everything. If a duplicated showed up on a file, it was a duplicate.
But now that we are not necessarily extracing IDs from every file, we can't just nuke the database anymore, otherwise we'd lose the information. Therefore, what we have to do is to convert every file, and only at the end check the duplicates.
Managed to do that with a single query as documented at: stackoverflow.com/questions/71235548/how-to-find-all-rows-that-have-certain-columns-duplicated-in-sequelize/71235550#71235550
CIA 2010 covert communication websites 2013 DNS census MX records by
Ciro Santilli 35 Updated 2025-04-24 +Created 1970-01-01
Let' see if there's anything in records/mx.xz.
mx.csv is 21GB.
They do have
"
in the files to escape commas so:mx.pyWould have been better with csvkit: stackoverflow.com/questions/36287982/bash-parse-csv-with-quotes-commas-and-newlines
import csv
import sys
writer = csv.writer(sys.stdout)
with open('mx.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
writer.writerow([row[0], row[3]])
then:
# uniq not amazing as there are often two or three slightly different records repeated on multiple timestamps, but down to 11 GB
python3 mx.py | uniq > mx-uniq.csv
sqlite3 mx.sqlite 'create table t(d text, m text)'
# 13 GB
time sqlite3 mx.sqlite ".import --csv --skip 1 'mx-uniq.csv' t"
# 41 GB
time sqlite3 mx.sqlite 'create index td on t(d)'
time sqlite3 mx.sqlite 'create index tm on t(m)'
time sqlite3 mx.sqlite 'create index tdm on t(d, m)'
# Remove dupes.
# Rows: 150m
time sqlite3 mx.sqlite <<EOF
delete from t
where rowid not in (
select min(rowid)
from t
group by d, m
)
EOF
# 15 GB
time sqlite3 mx.sqlite vacuum
Let's see what the hits use:
awk -F, 'NR>1{ print $2 }' ../media/cia-2010-covert-communication-websites/hits.csv | xargs -I{} sqlite3 mx.sqlite "select distinct * from t where d = '{}'"
At around 267 total hits, only 84 have MX records, and from those that do, almost all of them have exactly:with only three exceptions:We need to count out of the totals!which gives, ~18M, so nope, it is too much by itself...
smtp.secureserver.net
mailstore1.secureserver.net
dailynewsandsports.com|dailynewsandsports.com
inews-today.com|mail.inews-today.com
just-kidding-news.com|just-kidding-news.com
sqlite3 mx.sqlite "select count(*) from t where m = 'mailstore1.secureserver.net'"
Let's try to use that to reduce where
av.sqlite
from 2013 DNS Census virtual host cleanup a bit further:time sqlite3 mx.sqlite '.mode csv' "attach 'aiddcu.sqlite' as 'av'" '.load ./ip' "select ipi2s(av.t.i), av.t.d from av.t inner join t as mx on av.t.d = mx.d and mx.m = 'mailstore1.secureserver.net' order by av.t.i asc" > avm.csv
avm
stands for av
with mx
pruning. This leaves us with only ~500k entries left. With one more figerprint we could do a Wayback Machine CDX scanning scan.Let's check that we still have most our hits in there:At 267 hits we got 81, so all are still present.
grep -f <(awk -F, 'NR>1{print $2}' /home/ciro/bak/git/media/cia-2010-covert-communication-websites/hits.csv) avm.csv
secureserver is a hosting provider, we can see their blank page e.g. at: web.archive.org/web/20110128152204/http://emmano.com/. security.stackexchange.com/questions/12610/why-did-secureserver-net-godaddy-access-my-gmail-account/12616#12616 comments:
secureserver.net is the name GoDaddy use as the reverse DNS for IP addresses used for dedicated/virtual server hosting
Pinned article: ourbigbook/introduction-to-the-ourbigbook-project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 2. You can publish local OurBigBook lightweight markup files to either OurBigBook.com or as a static website.Figure 3. Visual Studio Code extension installation.Figure 5. . You can also edit articles on the Web editor without installing anything locally. Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact