CIA 2010 covert communication websites 2013 DNS census NS records Updated 2025-07-01 +Created 1970-01-01
We can also cut down the data a lot with stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column/76605540#76605540 and tld filtering:This brings us down to a much more manageable 3.0 GB, 83 M rows.
awk -F, 'BEGIN{OFS=","} { if ($1 != last) { print $1, $3; last = $1; } }' ns.csv | grep -E '\.(com|net|info|org|biz),' > nsu.csv
Let's just scan it once real quick to start with, since likely nothing will come of this venue:As of 267 hits we get:so yeah, most of those are likely going to be humongous just by looking at the names.
grep -f <(awk -F, 'NR>1{print $2}' ../media/cia-2010-covert-communication-websites/hits.csv) nsu.csv | tee nsu-hits.csv
cat nsu-hits.csv | csvcut -c 2 | sort | awk -F. '{OFS="."; print $(NF-1), $(NF)}' | sort | uniq -c | sort -k1 -n
1 a2hosting.com
1 amerinoc.com
1 ayns.net
1 dailyrazor.com
1 domainingdepot.com
1 easydns.com
1 frienddns.ru
1 hostgator.com
1 kolmic.com
1 name-services.com
1 namecity.com
1 netnames.net
1 tonsmovies.net
1 webmailer.de
2 cashparking.com
55 worldnic.com
86 domaincontrol.com
The smallest ones by far from the total are: frienddns.ru with only 487 hits, all others quite large or fake hits due to CSV. Did a quick Wayback Machine CDX scanning there but no luck alas.
Let's check the smaller ones:Doubt anything will come out of this.
inews-today.com,2013-08-12T03:14:01,ns1.frienddns.ru
source-commodities.net,2012-12-13T20:58:28,ns1.namecity.com -> fake hit due to grep e-commodities.net
dailynewsandsports.com,2013-08-13T08:36:28,ns3.a2hosting.com
just-kidding-news.com,2012-02-04T07:40:50,jns3.dailyrazor.com
fightwithoutrules.com,2012-11-09T01:17:40,sk.s2.ns1.ns92.kolmic.com
fightwithoutrules.com,2013-07-01T22:46:23,ns1625.ztomy.com
half-court.net,2012-09-10T09:49:15,sk.s2.ns1.ns92.kolmic.com
half-court.net,2013-07-07T00:31:12,ns1621.ztomy.com
CIA 2010 covert communication websites 2013 DNS Census virtual host cleanup heuristic keyword searches Updated 2025-07-01 +Created 1970-01-01
There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:
grep -e news -e noticias -e nouvelles -e world -e global
iran + football:
- iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.
3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:
- 66.45.179.205 noticiasporjanua.com
- 66.237.236.247 comunidaddenoticias.com
- 204.176.38.143 noticiassofisticadas.com
Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:
- 216.97.231.56 nouvelles-d-aujourdhuis.com
news + global:
- 204.176.39.115 globalprovincesnews.com
- 212.209.74.105 globalbaseballnews.com
- 212.209.79.40: hydradraco.com
OK, I've decided to do a complete Wayback Machine CDX scanning of
news
... Searching for .JAR
or https.*cgi-bin.*\.cgi
are killers, particularly the .jar hits, here's what came out:- 62.22.60.49 telecom-headlines.com
- 62.22.61.206 worldnewsnetworking.com
- 64.16.204.55 holein1news.com
- 66.104.169.184 bcenews.com
- 69.84.156.90 stickshiftnews.com
- 74.116.72.236 techtopnews.com
- 74.254.12.168 non-stop-news.net
- 193.203.49.212 inews-today.com
- 199.85.212.118 just-kidding-news.com
- 207.210.250.132 aeronet-news.com
- 212.4.18.129 sightseeingnews.com
- 212.209.90.84 thenewseditor.com
- 216.105.98.152 modernarabicnews.com
"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.
CIA 2010 covert communication websites activegameinfo.com Updated 2025-07-01 +Created 1970-01-01
whoisxmlapi WHOIS history March 22, 2011:
- Registrar Name: NETWORK SOLUTIONS, LLC.
- Created Date: January 26, 2010 00:00:00 UTC
- Updated Date: November 27, 2010 00:00:00 UTC
- Expires Date: January 26, 2012 00:00:00 UTC
- Registrant Name: Corral, Elizabeth|ATTN ACTIVEGAMINGINFO.COM|care of Network Solutions
- Registrant Street: PO Box 459
- Registrant City: PA
- Registrant State/Province: US
- Registrant Postal Code: 18222
- Registrant Country: UNITED STATES
- Administrative Name: Corral, Elizabeth|ATTN ACTIVEGAMINGINFO.COM|care of Network Solutions
- Administrative Street: PO Box 459
- Administrative City: Drums
- Administrative State/Province: PA
- Administrative Postal Code: 18222
- Administrative Country: UNITED STATES
- Administrative Email: xc2mv7ur8cw@networksolutionsprivateregistration.com
- Administrative Phone: 5707088780
- Name servers: NS23.DOMAINCONTROL.COM|NS24.DOMAINCONTROL.COM
CIA 2010 covert communication websites Are there .org hits? Updated 2025-07-01 +Created 1970-01-01
Previously it was unclear if there were any .org hits, until we found the first one with clear comms: web.archive.org/web/20110624203548/http://awfaoi.org/hand.jar
Later on, two more clear ones were found with expired domain trackers:further settling their existence. Later on newimages.org also came to light.
Others that had been previously found in IP ranges but without clear comms:
.org is very rare, and has been excluded from some of our search heuristics. That was a shame, but likely not much was missed.
CIA 2010 covert communication websites atomworldnews.com Updated 2025-07-01 +Created 1970-01-01
whoisxmlapi WHOIS record on April 17, 2011
We've come across a few shallow and stylistically similar websites on suspicious ranges with this pattern.
No JS/JAR/SWF comms, but rather a subdomain, and an HTTPS page with .cgi extension that leads to a login page. Some names seen for this subdomain:
The question is, is this part of some legitimate tooling that created such patterns? And if so which? Or are they actual hits with a new comms mechanism not previously seen?
The fact that:suggests to Ciro that they are an actual hit.
- hits of this type are so dense in the suspicious ranges
- they are so stylistically similar between on another
- citizenlabs specifically mentioned a "CGI" comms method
In particular, the
secure
and ssl
ones are overused, and together with some heuristics allowed us to find our first two non Reuters ranges! Section "secure subdomain search on 2013 DNS Census"Some currently known URLsIf we could do a crawl search for
- backstage.musical-fortune.net/cgi-bin/backstage.cgi
- clients.smart-travel-consultant.com/cgi-bin/clients.cgi
- members.it-proonline.com/cgi-bin/members.cgi
- members.metanewsdaily.com/cgi-bin/ABC.cgi
- miembros.todosperuahora.com/cgi-bin/business.cgi
- secure.altworldnews.com/cgi-bin/desk.cgi
- secure.driversinternationalgolf.com/cgi-bin/drivers.cgi
- secure.freshtechonline.com/cgi-bin/tech.cgi
- secure.globalnewsbulletin.com/cgi-bin/index.cgi
- secure.negativeaperture.com/cgi-bin/canon.cgi
- secure.riskandrewardnews.com/cgi-bin/worldwide.cgi
- secure.theworld-news.net/cgi-bin/news.cgi
- secure.topbillingsite.com/cgi-bin/main.cgi
- secure.worldnewsandent.com/cgi-bin/news.cgi
- ssl.beyondnetworknews.com/cgi-bin/local.cgi
- ssl.newtechfrontier.com/cgi-bin/tech.cgi
- www.businessexchangetoday.com/cgi-bin/business.cgi
- heal.conquermstoday.com (path unknown)
secure.*com/cgi-bin/*.cgi
that might be a good enough fingerprint, maybe even *.*com/cgi-bin/*.cgi
. Edit: it is not perfect, but we kind of did it: Section "secure subdomain search on 2013 DNS Census".But not every directed acyclic graph is a tree.
Example of a tree (and therefore also a DAG):Convention in this presentation: arrows implicitly point up, just like in a
5
|
4 7
| |
3 6
|/
2
|
1
git log
, i.e.:and so on. CIA 2010 covert communication websites CGI comms variant Updated 2025-07-01 +Created 1970-01-01
Later on, we've also come across some stylistic hits in IP ranges with apparent slight variations of the CGI comms pattern:
Since these are so rare, it is still a bit hard to classify them for sure, but they are of great interest no doubt, as as we start to notice these patterns more tend to come if it is a thing.
TODO what does this Chinese forum track? New registrations? Their focus seems to be domain name speculation
Some of the threads contain domain dumps. We haven't yet seen a scrapable URL pattern, but their data goes way back and did have various hits. The forum seems to have started in 2006: club.domain.cn/forum.php?mod=forumdisplay&fid=41&page=10127
club.domain.cn/forum.php?mod=viewthread&tid=241704 "【国际域名拟删除列表】2007年06月16日" is the earliest list we could find. It is an expired domain list.
Some hits:
- club.domain.cn/forum.php?mod=viewthread&tid=709388 contains
alljohnny.com
The thread title is "2009.5.04". The post date 2009-04-30Breadcrumb nav: 域名论坛 > 域名增值交易区 > 国际域名专栏 (domain name forum > area for domain names increasing in value > international domais)
So far, no new domains have been found with Common Crawl, nor have any existing known domains been found to be present in Common Crawl. Our working theory is that Common Crawl never reached the domains How did Alexa find the domains?
Let's try and do something with Common Crawl.
Unfortunately there's no IP data apparently: github.com/commoncrawl/cc-index-table/issues/30, so let's focus on the URLs.
Using their Common Crawl Athena method: commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/
Sample first output line:So
# 2
url_surtkey org,whwheelers)/robots.txt
url https://whwheelers.org/robots.txt
url_host_name whwheelers.org
url_host_tld org
url_host_2nd_last_part whwheelers
url_host_3rd_last_part
url_host_4th_last_part
url_host_5th_last_part
url_host_registry_suffix org
url_host_registered_domain whwheelers.org
url_host_private_suffix org
url_host_private_domain whwheelers.org
url_host_name_reversed
url_protocol https
url_port
url_path /robots.txt
url_query
fetch_time 2021-06-22 16:36:50.000
fetch_status 301
fetch_redirect https://www.whwheelers.org/robots.txt
content_digest 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
content_mime_type text/html
content_mime_detected text/html
content_charset
content_languages
content_truncated
warc_filename crawl-data/CC-MAIN-2021-25/segments/1623488519183.85/robotstxt/CC-MAIN-20210622155328-20210622185328-00312.warc.gz
warc_record_offset 1854030
warc_record_length 639
warc_segment 1623488519183.85
crawl CC-MAIN-2021-25
subset robotstxt
url_host_3rd_last_part
might be a winner for CGI comms fingerprinting!Naive one for one index:have no results... data scanned: 5.73 GB
select * from "ccindex"."ccindex" where url_host_registered_domain = 'conquermstoday.com' limit 100;
Let's see if they have any of the domain hits. Let's also restrict by date to try and reduce the data scanned:Humm, data scanned: 60.59 GB and no hits... weird.
select * from "ccindex"."ccindex" where
fetch_time < TIMESTAMP '2014-01-01 00:00:00' AND
url_host_registered_domain IN (
'activegaminginfo.com',
'altworldnews.com',
...
'topbillingsite.com',
'worldwildlifeadventure.com'
)
Sanity check:has a bunch of hits of course. Data scanned: 212.88 MB,
select * from "ccindex"."ccindex" WHERE
crawl = 'CC-MAIN-2013-20' AND
subset = 'warc' AND
url_host_registered_domain IN (
'google.com',
'amazon.com'
)
WHERE
crawl
and subset
are a must! Should have read the article first.Let's widen a bit more:Still nothing found... they don't seem to have any of the URLs of interest?
select * from "ccindex"."ccindex" WHERE
crawl IN (
'CC-MAIN-2013-20',
'CC-MAIN-2013-48',
'CC-MAIN-2014-10'
) AND
subset = 'warc' AND
url_host_registered_domain IN (
'activegaminginfo.com',
'altworldnews.com',
...
'worldnewsandent.com',
'worldwildlifeadventure.com'
)
Some people like merges, but they are ugly and stupid. Rebase instead and keep linear history.
Linear history:
5 master
|
4
|
3
|
2
|
1 first commit
Branched history:
7 master
|\
| \
6 \
|\ \
| | |
3 4 5
| | |
| / /
|/ /
2 /
| /
1/ first commit
Which type of tree do you think will be easier to understand and maintain?
????
????????????
You may disconnect now if you still like branched history.
Oh but there are usually 2 trees: local and remote.
So you also have to learn how to observe and modify and sync with the remote tree!
But basically:to update the remote tree. And then you can use it exactly like any other branch, except you prefix them with the remote (usually
git fetch
origin/*
), e.g.:origin/master
is the latest fetch of the remote version ofmaster
origin/my-feature
is the latest fetch of the remote version ofmy-feature
While Ciro Santilli is a big fan of having "one global country" (and language), which is somewhat approximated by globalization, he has come to believe that there is one serious downside to globalization as it stands in 2020: it allows companies to pressure governments to reduce taxes, and thus reduces the power of government, which in turn increases social inequality. This idea is very well highlighted in Can't get you out of my head by Adam Curtis (2021).
The only solution seems to be for governments to get together, and make deals to have fair taxation across each other. Which might never happen.
Ciro Santilli just always feels that what can be classified as "artificial life" simulators have too much focus on beating more continuous population mechanics, and lack the discrete elements which he feels could be important to AGI: Section "The missing link between continuous and discrete AI".
There is great interest in this direction of research however quite clearly.
There are unlisted articles, also show them or only show them.