There are two keywords that are killers: "news" and "world" and their translations or closely related words. Everything else is hard. So a good start is:
grep -e news -e noticias -e nouvelles -e world -e global
iran + football:
  • iranfootballsource.com: the third hit for this area after the two given by Reuters! Epic.
3 easy hits with "noticias" (news in Portuguese or Spanish"), uncovering two brand new ip ranges:
  • 66.45.179.205 noticiasporjanua.com
  • 66.237.236.247 comunidaddenoticias.com
  • 204.176.38.143 noticiassofisticadas.com
Let's see some French "nouvelles/actualites" for those tumultuous Maghrebis:
  • 216.97.231.56 nouvelles-d-aujourdhuis.com
news + world:
  • 210.80.75.55 philippinenewsonline.net
news + global:
  • 204.176.39.115 globalprovincesnews.com
  • 212.209.74.105 globalbaseballnews.com
  • 212.209.79.40: hydradraco.com
OK, I've decided to do a complete Wayback Machine CDX scanning of news... Searching for .JAR or https.*cgi-bin.*\.cgi are killers, particularly the .jar hits, here's what came out:
  • 62.22.60.49 telecom-headlines.com
  • 62.22.61.206 worldnewsnetworking.com
  • 64.16.204.55 holein1news.com
  • 66.104.169.184 bcenews.com
  • 69.84.156.90 stickshiftnews.com
  • 74.116.72.236 techtopnews.com
  • 74.254.12.168 non-stop-news.net
  • 193.203.49.212 inews-today.com
  • 199.85.212.118 just-kidding-news.com
  • 207.210.250.132 aeronet-news.com
  • 212.4.18.129 sightseeingnews.com
  • 212.209.90.84 thenewseditor.com
  • 216.105.98.152 modernarabicnews.com
Wayback Machine CDX scanning of "world":
  • 66.104.173.186 myworldlymusic.com
"headline": only 140 matches in 2013-dns-census-a-novirt.csv and 3 hits out of 269 hits. Full inspection without CDX led to no new hits.
"today": only 3.5k matches in 2013-dns-census-a-novirt.csv and 12 hits out of 269 hits, TODO how many on those on 2013-dns-census-a-novirt? No new hits.
"world", "global", "international", and spanish/portuguese/French versions like "mondo", "mundo", "mondi": 15k matches in 2013-dns-census-a-novirt.csv. No new hits.
whoisxmlapi WHOIS history March 22, 2011:
  • Registrar Name: NETWORK SOLUTIONS, LLC.
  • Created Date: January 26, 2010 00:00:00 UTC
  • Updated Date: November 27, 2010 00:00:00 UTC
  • Expires Date: January 26, 2012 00:00:00 UTC
  • Registrant Name: Corral, Elizabeth|ATTN ACTIVEGAMINGINFO.COM|care of Network Solutions
  • Registrant Street: PO Box 459
  • Registrant City: PA
  • Registrant State/Province: US
  • Registrant Postal Code: 18222
  • Registrant Country: UNITED STATES
  • Administrative Name: Corral, Elizabeth|ATTN ACTIVEGAMINGINFO.COM|care of Network Solutions
  • Administrative Street: PO Box 459
  • Administrative City: Drums
  • Administrative State/Province: PA
  • Administrative Postal Code: 18222
  • Administrative Country: UNITED STATES
  • Administrative Email: xc2mv7ur8cw@networksolutionsprivateregistration.com
  • Administrative Phone: 5707088780
  • Name servers: NS23.DOMAINCONTROL.COM|NS24.DOMAINCONTROL.COM
Previously it was unclear if there were any .org hits, until we found the first one with clear comms: web.archive.org/web/20110624203548/http://awfaoi.org/hand.jar
Later on, two more clear ones were found with expired domain trackers:
further settling their existence. Later on newimages.org also came to light.
Others that had been previously found in IP ranges but without clear comms:
  • 65.61.127.177: material-science.org
  • 212.4.17.61: tech-stop.org
  • 74.116.72.244 arborstribune.org
.org is very rare, and has been excluded from some of our search heuristics. That was a shame, but likely not much was missed.
whoisxmlapi WHOIS record on April 17, 2011
  • Created Date: April 9, 2010 00:00:00 UTC
  • Updated Date: April 9, 2010 00:00:00 UTC
  • Expires Date: April 9, 2012 00:00:00 UTC
  • Registrant Name: domainsbyproxy.com
  • Name servers: NS33.DOMAINCONTROL.COM|NS34.DOMAINCONTROL.COM
We've come across a few shallow and stylistically similar websites on suspicious ranges with this pattern.
No JS/JAR/SWF comms, but rather a subdomain, and an HTTPS page with .cgi extension that leads to a login page. Some names seen for this subdomain:
  • secure.: most common
  • ssl.: also common
  • various other more creative ones linked to the website theme itself, e.g.:
    • musical-fortune.net has a backstage.musical-fortune.net
The question is, is this part of some legitimate tooling that created such patterns? And if so which? Or are they actual hits with a new comms mechanism not previously seen?
The fact that:
  • hits of this type are so dense in the suspicious ranges
  • they are so stylistically similar between on another
  • citizenlabs specifically mentioned a "CGI" comms method
suggests to Ciro that they are an actual hit.
In particular, the secure and ssl ones are overused, and together with some heuristics allowed us to find our first two non Reuters ranges! Section "secure subdomain search on 2013 DNS Census"
But not every directed acyclic graph is a tree.
Example of a tree (and therefore also a DAG):
5
|
4 7
| |
3 6
|/
2
|
1
Convention in this presentation: arrows implicitly point up, just like in a git log, i.e.:
  • 1 is parent of 2
  • 2 is parent of 3 and 6
  • 3 is parent of 4
and so on.
Example of a DAG that is not a tree:
7
|\
4 6
| |
3 5
|/
2
|
1
This is not a tree because there are two ways to reach 7:
But we often say "tree" intead of "DAG" in the context of Git because DAG sounds ugly.
Example of a graph that is not a DAG:
6
^
|
3->4
^  |
|  v
2<-5
^
|
1
This one is not acyclic because there is a cycle 2, 3, 4, 5, 2.
TODO what does this Chinese forum track? New registrations? Their focus seems to be domain name speculation
Some of the threads contain domain dumps. We haven't yet seen a scrapable URL pattern, but their data goes way back and did have various hits. The forum seems to have started in 2006: club.domain.cn/forum.php?mod=forumdisplay&fid=41&page=10127
club.domain.cn/forum.php?mod=viewthread&tid=241704 "【国际域名拟删除列表】2007年06月16日" is the earliest list we could find. It is an expired domain list.
Some hits:
So far, no new domains have been found with Common Crawl, nor have any existing known domains been found to be present in Common Crawl. Our working theory is that Common Crawl never reached the domains How did Alexa find the domains?
Let's try and do something with Common Crawl.
Unfortunately there's no IP data apparently: github.com/commoncrawl/cc-index-table/issues/30, so let's focus on the URLs.
Hello world:
select * from "ccindex"."ccindex" limit 100;
Data scanned: 11.75 MB
Sample first output line:
#                            2
url_surtkey                  org,whwheelers)/robots.txt
url                          https://whwheelers.org/robots.txt
url_host_name                whwheelers.org
url_host_tld                 org
url_host_2nd_last_part       whwheelers
url_host_3rd_last_part
url_host_4th_last_part
url_host_5th_last_part
url_host_registry_suffix     org
url_host_registered_domain   whwheelers.org
url_host_private_suffix      org
url_host_private_domain      whwheelers.org
url_host_name_reversed
url_protocol                 https
url_port
url_path                     /robots.txt
url_query
fetch_time                   2021-06-22 16:36:50.000
fetch_status                 301
fetch_redirect               https://www.whwheelers.org/robots.txt
content_digest               3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
content_mime_type            text/html
content_mime_detected        text/html
content_charset
content_languages
content_truncated
warc_filename                crawl-data/CC-MAIN-2021-25/segments/1623488519183.85/robotstxt/CC-MAIN-20210622155328-20210622185328-00312.warc.gz
warc_record_offset           1854030
warc_record_length           639
warc_segment                 1623488519183.85
crawl                        CC-MAIN-2021-25
subset                       robotstxt
So url_host_3rd_last_part might be a winner for CGI comms fingerprinting!
Naive one for one index:
select * from "ccindex"."ccindex" where url_host_registered_domain = 'conquermstoday.com' limit 100;
have no results... data scanned: 5.73 GB
Let's see if they have any of the domain hits. Let's also restrict by date to try and reduce the data scanned:
select * from "ccindex"."ccindex" where
  fetch_time < TIMESTAMP '2014-01-01 00:00:00' AND
  url_host_registered_domain IN (
   'activegaminginfo.com',
   'altworldnews.com',
   ...
   'topbillingsite.com',
   'worldwildlifeadventure.com'
 )
Humm, data scanned: 60.59 GB and no hits... weird.
Sanity check:
select * from "ccindex"."ccindex" WHERE
  crawl = 'CC-MAIN-2013-20' AND
  subset = 'warc' AND
  url_host_registered_domain IN (
   'google.com',
   'amazon.com'
 )
has a bunch of hits of course. Data scanned: 212.88 MB, WHERE crawl and subset are a must! Should have read the article first.
Let's widen a bit more:
select * from "ccindex"."ccindex" WHERE
  crawl IN (
    'CC-MAIN-2013-20',
    'CC-MAIN-2013-48',
    'CC-MAIN-2014-10'
  ) AND
  subset = 'warc' AND
  url_host_registered_domain IN (
    'activegaminginfo.com',
    'altworldnews.com',
    ...
    'worldnewsandent.com',
    'worldwildlifeadventure.com'
 )
Still nothing found... they don't seem to have any of the URLs of interest?
There are two ways to organize a project:
Some people like merges, but they are ugly and stupid. Rebase instead and keep linear history.
Linear history:
5 master
|
4
|
3
|
2
|
1 first commit
Branched history:
7   master
|\
| \
6  \
|\  \
| |  |
3 4  5
| |  |
| /  /
|/  /
2  /
| /
1/  first commit
Here commits 6 and 7 are the so called "merge commits":
  • they have multiple parents:
    • 6 has parents 3 and 4
    • 7 has parents 5 and 6
  • they are useless and don't contain any real information
Which type of tree do you think will be easier to understand and maintain?
????
????????????
You may disconnect now if you still like branched history.
Oh but there are usually 2 trees: local and remote.
So you also have to learn how to observe and modify and sync with the remote tree!
But basically:
git fetch
to update the remote tree. And then you can use it exactly like any other branch, except you prefix them with the remote (usually origin/*), e.g.:
  • origin/master is the latest fetch of the remote version of master
  • origin/my-feature is the latest fetch of the remote version of my-feature
While Ciro Santilli is a big fan of having "one global country" (and language), which is somewhat approximated by globalization, he has come to believe that there is one serious downside to globalization as it stands in 2020: it allows companies to pressure governments to reduce taxes, and thus reduces the power of government, which in turn increases social inequality. This idea is very well highlighted in Can't get you out of my head by Adam Curtis (2021).
The only solution seems to be for governments to get together, and make deals to have fair taxation across each other. Which might never happen.
This is a dark art, and many of the sources are shady as fuck! We often have no idea of their methodology. Also no source is fully complete. We just piece up as best we can.
In order to explore IPs in known IP ranges, what we need are good DNS databases.
This data source was very valuable, and led to many hits, and to finding the first non Reuters ranges with Section "secure subdomain search on 2013 DNS Census".
Hit overlap:
jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) | xargs -I{} sqlite3 aiddcu.sqlite "select * from t where d = '{}'"
Domain hit count when we were at 279 hits: 142 hits, so about half of the hits were present.
The timing of the database is perfect for this project, it is as if the CIA had planted it themselves!
dnshistory.org contains historical domain -> mappings.
We have not managed to extract much from this source, they don't have as much data on the range of interest.
But they do have some unique data at least, perhaps we should try them a bit more often, e.g. they were the only source we've seen so far that made the association: headlines2day.com -> 212.209.74.126 which places it in the more plausible globalbaseballnews.com IP range.
TODO can it do IP to domain? Or just domain to IP? Asked on their Discord: discord.com/channels/698151879166918727/968586102493552731/1124254204257632377. Their banner suggests that yes:
With our new look website you can now find other domains hosted on the same IP address, your website neighbours and more even quicker than before.
Owner replied, you can't:
At the moment you can only do this for current not historical records
This is a shame, reverse IP here could be quite valuable.
In principle, we could obtain this data from search engines, but Google doesn't track that entire website well, e.g. no hits for site:dnshistory.org "62.22.60.48" presumably due to heavy IP throttling.
Homepage dnshistory.org/ gives date starting in 2009:
Here at DNS History we have been crawling DNS records since 2009, our database currently contains over 1 billion domains and over 12 billion DNS records.
and it is true that they do have some hits from that useful era.
whoisxmlapi WHOIS record on April 28, 2011
  • Registrar Name: GODADDY.COM, INC
  • Created Date: February 9, 2010 00:00:00 UTC
  • Updated Date: February 9, 2010 00:00:00 UTC
  • Expires Date: February 9, 2015 00:00:00 UTC
  • Registrant Name: domainsbyproxy.com
  • Name servers: NS55.DOMAINCONTROL.COM|NS56.DOMAINCONTROL.COM
The JavaScript of each website appears to be quite small and similarly sized. They are all minimized, but have reordered things around a bit.
First we have to know that the Wayback Machine adds some stuff before and after the original code. The actual code there starts at:
ap={fg:['MSXML2.XMLHTTP
and ends in:
ck++;};return fu;};
We can use a JavaScript beautifier such as beautifier.io/ to be abe to better read the code.
It is worth noting that there's a lot of <script> tags inline as well, which seem to matter.
Further analysis would be needed.
All IP ranges have some holes in them for which we don't have a domain name.
It is because there was nothing there, or just because we don't have a good enough reverse IP database?
It is possible that DomainTools could help with a more complete database, but its access is extremely expensive and out of reach at the moment.
Censys is another option that would be good to try.
Putting 140 USD into WhoisXMLAPI to get all whois histories of interest for possible reverse searches would also be of interest.

Unlisted articles are being shown, click here to show only listed articles.