Ciro Santilli @cirosantilli 37

 Articles (11k) Discussions (25) Comments (62) Follows  Received likes Files

 New  Updated  Top  Announced  A-Z  Liked  Followed

CIA 2010 covert communication websites / 2013 DNS census NS records  Updated 2025-07-01  +Created 1970-01-01

ns.csv is 57 GB. This file is too massive, working with it is a pain.

We can also cut down the data a lot with stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column/76605540#76605540 and tld filtering:

awk -F, 'BEGIN{OFS=","} { if ($1 != last) { print $1, $3; last = $1; } }' ns.csv | grep -E '\.(com|net|info|org|biz),' > nsu.csv

This brings us down to a much more manageable 3.0 GB, 83 M rows.

Let's just scan it once real quick to start with, since likely nothing will come of this venue:

grep -f <(awk -F, 'NR>1{print $2}' ../media/cia-2010-covert-communication-websites/hits.csv) nsu.csv | tee nsu-hits.csv
cat nsu-hits.csv | csvcut -c 2 | sort | awk -F. '{OFS="."; print $(NF-1), $(NF)}' | sort | uniq -c | sort -k1 -n

As of 267 hits we get:

      1 a2hosting.com
      1 amerinoc.com
      1 ayns.net
      1 dailyrazor.com
      1 domainingdepot.com
      1 easydns.com
      1 frienddns.ru
      1 hostgator.com
      1 kolmic.com
      1 name-services.com
      1 namecity.com
      1 netnames.net
      1 tonsmovies.net
      1 webmailer.de
      2 cashparking.com
     55 worldnic.com
     86 domaincontrol.com

so yeah, most of those are likely going to be humongous just by looking at the names.

The smallest ones by far from the total are: frienddns.ru with only 487 hits, all others quite large or fake hits due to CSV. Did a quick Wayback Machine CDX scanning there but no luck alas.

Let's check the smaller ones:

inews-today.com,2013-08-12T03:14:01,ns1.frienddns.ru
source-commodities.net,2012-12-13T20:58:28,ns1.namecity.com -> fake hit due to grep e-commodities.net
dailynewsandsports.com,2013-08-13T08:36:28,ns3.a2hosting.com
just-kidding-news.com,2012-02-04T07:40:50,jns3.dailyrazor.com
fightwithoutrules.com,2012-11-09T01:17:40,sk.s2.ns1.ns92.kolmic.com
fightwithoutrules.com,2013-07-01T22:46:23,ns1625.ztomy.com
half-court.net,2012-09-10T09:49:15,sk.s2.ns1.ns92.kolmic.com
half-court.net,2013-07-07T00:31:12,ns1621.ztomy.com

Doubt anything will come out of this.

Let's do a bit of counting out of the total:

grep domaincontrol.com ns.csv | awk -F, '{print $1}' | uniq | wc

gives ~20M domain using domaincontrol. Let's see how many domains are in the first place:

awk -F, '{print $1}' ns.csv | uniq | wc

so it accounts for 1/4 of the total.

 Read the full article

CIA 2010 covert communication websites / 2013 DNS Census virtual host cleanup heuristic keyword searches  Updated 2025-07-01  +Created 1970-01-01

 View more

There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:

grep -e news -e noticias -e nouvelles -e world -e global

iran + football:

iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.

3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:

66.45.179.205 noticiasporjanua.com
66.237.236.247 comunidaddenoticias.com
204.176.38.143 noticiassofisticadas.com

Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:

216.97.231.56 nouvelles-d-aujourdhuis.com

news + world:

210.80.75.55 philippinenewsonline.net

news + global:

204.176.39.115 globalprovincesnews.com
212.209.74.105 globalbaseballnews.com
212.209.79.40: hydradraco.com

OK, I've decided to do a complete Wayback Machine CDX scanning of news... Searching for .JAR or https.*cgi-bin.*\.cgi are killers, particularly the .jar hits, here's what came out:

62.22.60.49 telecom-headlines.com
62.22.61.206 worldnewsnetworking.com
64.16.204.55 holein1news.com
66.104.169.184 bcenews.com
69.84.156.90 stickshiftnews.com
74.116.72.236 techtopnews.com
74.254.12.168 non-stop-news.net
193.203.49.212 inews-today.com
199.85.212.118 just-kidding-news.com
207.210.250.132 aeronet-news.com
212.4.18.129 sightseeingnews.com
212.209.90.84 thenewseditor.com
216.105.98.152 modernarabicnews.com

Wayback Machine CDX scanning of "world":

66.104.173.186 myworldlymusic.com

"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.

"today": only 3.5k matches in 2013-dns-census-a-novirt.csv and 12 hits out of 269 hits, TODO how many on those on 2013-dns-census-a-novirt? No new hits.

"world", "global", "international", and spanish/portuguese/French versions like "mondo", "mundo", "mondi": 15k matches in 2013-dns-census-a-novirt.csv. No new hits.

 Read the full article

CIA 2010 covert communication websites / activegameinfo.com  Updated 2025-07-01  +Created 1970-01-01

 View more

whoisxmlapi WHOIS history March 22, 2011:

Registrar Name: NETWORK SOLUTIONS, LLC.
Created Date: January 26, 2010 00:00:00 UTC
Updated Date: November 27, 2010 00:00:00 UTC
Expires Date: January 26, 2012 00:00:00 UTC
Registrant Name: Corral, Elizabeth|ATTN ACTIVEGAMINGINFO.COM|care of Network Solutions
Registrant Street: PO Box 459
Registrant City: PA
Registrant State/Province: US
Registrant Postal Code: 18222
Registrant Country: UNITED STATES
Administrative Name: Corral, Elizabeth|ATTN ACTIVEGAMINGINFO.COM|care of Network Solutions
Administrative Street: PO Box 459
Administrative City: Drums
Administrative State/Province: PA
Administrative Postal Code: 18222
Administrative Country: UNITED STATES
Administrative Email: xc2mv7ur8cw@networksolutionsprivateregistration.com
Administrative Phone: 5707088780
Name servers: NS23.DOMAINCONTROL.COM|NS24.DOMAINCONTROL.COM

 Read the full article

CIA 2010 covert communication websites / Are there .org hits?  Updated 2025-07-01  +Created 1970-01-01

 View more

Previously it was unclear if there were any .org hits, until we found the first one with clear comms: web.archive.org/web/20110624203548/http://awfaoi.org/hand.jar

Later on, two more clear ones were found with expired domain trackers:

azerinews.org
autism-news.org

further settling their existence. Later on newimages.org also came to light.

Others that had been previously found in IP ranges but without clear comms:

65.61.127.177: material-science.org
212.4.17.61: tech-stop.org
74.116.72.244 arborstribune.org

.org is very rare, and has been excluded from some of our search heuristics. That was a shame, but likely not much was missed.

 Read the full article

CIA 2010 covert communication websites / atomworldnews.com  Updated 2025-07-01  +Created 1970-01-01

 View more

whoisxmlapi WHOIS record on April 17, 2011

Created Date: April 9, 2010 00:00:00 UTC
Updated Date: April 9, 2010 00:00:00 UTC
Expires Date: April 9, 2012 00:00:00 UTC
Registrant Name: domainsbyproxy.com
Name servers: NS33.DOMAINCONTROL.COM|NS34.DOMAINCONTROL.COM

 Read the full article

CIA 2010 covert communication websites / CGI comms  Updated 2025-07-01  +Created 1970-01-01

 View more

We've come across a few shallow and stylistically similar websites on suspicious ranges with this pattern.

No JS/JAR/SWF comms, but rather a subdomain, and an HTTPS page with .cgi extension that leads to a login page. Some names seen for this subdomain:

secure.: most common
ssl.: also common
various other more creative ones linked to the website theme itself, e.g.:
- musical-fortune.net has a backstage.musical-fortune.net

The question is, is this part of some legitimate tooling that created such patterns? And if so which? Or are they actual hits with a new comms mechanism not previously seen?

The fact that:

hits of this type are so dense in the suspicious ranges
they are so stylistically similar between on another
citizenlabs specifically mentioned a "CGI" comms method

suggests to Ciro that they are an actual hit.

In particular, the secure and ssl ones are overused, and together with some heuristics allowed us to find our first two non Reuters ranges! Section "secure subdomain search on 2013 DNS Census"

Some currently known URLs

If we could do a crawl search for secure.*com/cgi-bin/*.cgi that might be a good enough fingerprint, maybe even *.*com/cgi-bin/*.cgi. Edit: it is not perfect, but we kind of did it: Section "secure subdomain search on 2013 DNS Census".

 Read the full article

Git tips / It's not a tree, it's actually a DAG  Updated 2025-07-01  +Created 1970-01-01

 View more

Every tree is a directed acyclic graph.

But not every directed acyclic graph is a tree.

Example of a tree (and therefore also a DAG):

5
|
4 7
| |
3 6
|/
2
|
1

Convention in this presentation: arrows implicitly point up, just like in a git log, i.e.:

1 is parent of 2
2 is parent of 3 and 6
3 is parent of 4

and so on.

Example of a DAG that is not a tree:

7
|\
4 6
| |
3 5
|/
2
|
1

This is not a tree because there are two ways to reach 7:

2, 3, 4, 7
2, 5, 6, 7

But we often say "tree" intead of "DAG" in the context of Git because DAG sounds ugly.

Example of a graph that is not a DAG:

6
^
|
3->4
^  |
|  v
2<-5
^
|
1

This one is not acyclic because there is a cycle 2, 3, 4, 5, 2.

 Read the full article

CIA 2010 covert communication websites / CGI comms variant  Updated 2025-07-01  +Created 1970-01-01

 View more

Later on, we've also come across some stylistic hits in IP ranges with apparent slight variations of the CGI comms pattern:

Since these are so rare, it is still a bit hard to classify them for sure, but they are of great interest no doubt, as as we start to notice these patterns more tend to come if it is a thing.

 Read the full article

CIA 2010 covert communication websites / club.domain.cn  Updated 2025-07-01  +Created 1970-01-01

 View more

TODO what does this Chinese forum track? New registrations? Their focus seems to be domain name speculation

Some of the threads contain domain dumps. We haven't yet seen a scrapable URL pattern, but their data goes way back and did have various hits. The forum seems to have started in 2006: club.domain.cn/forum.php?mod=forumdisplay&fid=41&page=10127

club.domain.cn/forum.php?mod=viewthread&tid=241704 "【国际域名拟删除列表】2007年06月16日" is the earliest list we could find. It is an expired domain list.

Some hits:

club.domain.cn/forum.php?mod=viewthread&tid=709388 contains alljohnny.com The thread title is "2009.5.04". The post date 2009-04-30
Breadcrumb nav: 域名论坛 > 域名增值交易区 > 国际域名专栏 (domain name forum > area for domain names increasing in value > international domais)

 Read the full article

CIA 2010 covert communication websites / Common Crawl  Updated 2025-07-01  +Created 1970-01-01

 View more

So far, no new domains have been found with Common Crawl, nor have any existing known domains been found to be present in Common Crawl. Our working theory is that Common Crawl never reached the domains How did Alexa find the domains?

Let's try and do something with Common Crawl.

Unfortunately there's no IP data apparently: github.com/commoncrawl/cc-index-table/issues/30, so let's focus on the URLs.

Using their Common Crawl Athena method: commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/

Hello world:

select * from "ccindex"."ccindex" limit 100;

Data scanned: 11.75 MB

Sample first output line:

#                            2
url_surtkey                  org,whwheelers)/robots.txt
url                          https://whwheelers.org/robots.txt
url_host_name                whwheelers.org
url_host_tld                 org
url_host_2nd_last_part       whwheelers
url_host_3rd_last_part
url_host_4th_last_part
url_host_5th_last_part
url_host_registry_suffix     org
url_host_registered_domain   whwheelers.org
url_host_private_suffix      org
url_host_private_domain      whwheelers.org
url_host_name_reversed
url_protocol                 https
url_port
url_path                     /robots.txt
url_query
fetch_time                   2021-06-22 16:36:50.000
fetch_status                 301
fetch_redirect               https://www.whwheelers.org/robots.txt
content_digest               3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
content_mime_type            text/html
content_mime_detected        text/html
content_charset
content_languages
content_truncated
warc_filename                crawl-data/CC-MAIN-2021-25/segments/1623488519183.85/robotstxt/CC-MAIN-20210622155328-20210622185328-00312.warc.gz
warc_record_offset           1854030
warc_record_length           639
warc_segment                 1623488519183.85
crawl                        CC-MAIN-2021-25
subset                       robotstxt

So url_host_3rd_last_part might be a winner for CGI comms fingerprinting!

Naive one for one index:

select * from "ccindex"."ccindex" where url_host_registered_domain = 'conquermstoday.com' limit 100;

have no results... data scanned: 5.73 GB

Let's see if they have any of the domain hits. Let's also restrict by date to try and reduce the data scanned:

select * from "ccindex"."ccindex" where
  fetch_time < TIMESTAMP '2014-01-01 00:00:00' AND
  url_host_registered_domain IN (
   'activegaminginfo.com',
   'altworldnews.com',
   ...
   'topbillingsite.com',
   'worldwildlifeadventure.com'
 )

Humm, data scanned: 60.59 GB and no hits... weird.

Sanity check:

select * from "ccindex"."ccindex" WHERE
  crawl = 'CC-MAIN-2013-20' AND
  subset = 'warc' AND
  url_host_registered_domain IN (
   'google.com',
   'amazon.com'
 )

has a bunch of hits of course. Data scanned: 212.88 MB, WHERE crawl and subset are a must! Should have read the article first.

Let's widen a bit more:

select * from "ccindex"."ccindex" WHERE
  crawl IN (
    'CC-MAIN-2013-20',
    'CC-MAIN-2013-48',
    'CC-MAIN-2014-10'
  ) AND
  subset = 'warc' AND
  url_host_registered_domain IN (
    'activegaminginfo.com',
    'altworldnews.com',
    ...
    'worldnewsandent.com',
    'worldwildlifeadventure.com'
 )

Still nothing found... they don't seem to have any of the URLs of interest?

 Read the full article

Git tips / Linear history vs branching  Updated 2025-07-01  +Created 1970-01-01

 View more

There are two ways to organize a project:

linear history
branched history: history with merge commits

Some people like merges, but they are ugly and stupid. Rebase instead and keep linear history.

Linear history:

5 master
|
4
|
3
|
2
|
1 first commit

Branched history:

7   master
|\
| \
6  \
|\  \
| |  |
3 4  5
| |  |
| /  /
|/  /
2  /
| /
1/  first commit

Here commits 6 and 7 are the so called "merge commits":

they have multiple parents:
- 6 has parents 3 and 4
- 7 has parents 5 and 6
they are useless and don't contain any real information

Which type of tree do you think will be easier to understand and maintain?

????

????????????

You may disconnect now if you still like branched history.

 Read the full article

Git tips / Oh, but there are 2 trees: local and remote  Updated 2025-07-01  +Created 1970-01-01

 View more

Oh but there are usually 2 trees: local and remote.

So you also have to learn how to observe and modify and sync with the remote tree!

But basically:

git fetch

to update the remote tree. And then you can use it exactly like any other branch, except you prefix them with the remote (usually origin/*), e.g.:

origin/master is the latest fetch of the remote version of master
origin/my-feature is the latest fetch of the remote version of my-feature

 Read the full article

Globalization reduces the power of governments  Updated 2025-07-01  +Created 1970-01-01

 View more

While Ciro Santilli is a big fan of having "one global country" (and language), which is somewhat approximated by globalization, he has come to believe that there is one serious downside to globalization as it stands in 2020: it allows companies to pressure governments to reduce taxes, and thus reduces the power of government, which in turn increases social inequality. This idea is very well highlighted in Can't get you out of my head by Adam Curtis (2021).

The only solution seems to be for governments to get together, and make deals to have fair taxation across each other. Which might never happen.

 Read the full article

POSIX command line utility  Updated 2025-07-01  +Created 1970-01-01

 View more

Listed at: pubs.opengroup.org/onlinepubs/9699919799/utilities/contents.htm

 Read the full article

wc (unix)  Updated 2025-07-01  +Created 1970-01-01

 Read the full article

GNU Core Utils  Updated 2025-07-01  +Created 1970-01-01

 View more

pubs.opengroup.org/onlinepubs/9699919799/utilities/contents.html

 Read the full article

Imperative programming  Updated 2025-07-01  +Created 1970-01-01

 Read the full article

Personal knowledge base software  Updated 2025-07-01  +Created 1970-01-01

 Read the full article

Augustin-Jean Fresnel  Updated 2025-07-01  +Created 1970-01-01

 Read the full article

Software-based artificial life  Updated 2025-07-01  +Created 1970-01-01

 View more

Some of the software-based artificial life simulators can be used as AI training game.

Ciro Santilli just always feels that what can be classified as "artificial life" simulators have too much focus on beating more continuous population mechanics, and lack the discrete elements which he feels could be important to AGI: Section "The missing link between continuous and discrete AI".

There is great interest in this direction of research however quite clearly.

 Read the full article

 There are unlisted articles, also show them or only show them.