@cirosantilli/_file/graphviz/graphviz/quotes.dot by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
@cirosantilli/_file/graphviz/graphviz/node.dot by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
@cirosantilli/_file/graphviz/graphviz/quotes-escape.dot by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
So dominant that it is usualy called just "zip".
Add diretory prefix to ZIP on Linux CLI by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
On the Relative Motion of the Earth and the Luminiferous Ether by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
This paper is in the public domain and people have uploaded it e.g. to glorious Wikisource: en.wikisource.org/wiki/On_the_Relative_Motion_of_the_Earth_and_the_Luminiferous_Ether including its amazing illustrations.
Some cool ones:
- playinside.me
Archive example: web.archive.org/web/20130726224338/http://librarianhelper.com/
@cirosantilli/_file/html/html/details-toc.html by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
- hupo.com: e.g. static.hupo.com/expdomain_myadmin/2012-03-06(国际域名).txt. Heavily IP throttled. Tor hindered more than helped.Scraping script: cia-2010-covert-communication-websites/hupo.sh. Scraping does about 1 day every 5 minutes relatively reliably, so about 36 hours / year. Not bad.Results are stored under
tmp/humo/<day>
.Check for hit overlap:The hits are very well distributed amongst days and months, at least they did a good job hiding these potential timing fingerprints. This feels very deliberately designed.grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/hupo/*
There are lots of hits. The data set is very inclusive. Also we understand that it must have been obtains through means other than Web crawling, since it contains so many of the hits.Nice output format for scraping as the HTML is very minimalThey randomly changed their URL format to remove the space before the .com after 2012-02-03:Some of their files are simply missing however unfortunately, e.g. neither of the following exist:webmasterhome.cn did contain that one however: domain.webmasterhome.cn/com/2012-07-01.asp. Hmm. we might have better luck over there then?2018-11-19 is corrupt in a new and wonderful way, with a bunch of trailing zeros:ends in:wget -O hupo-2018-11-19 'http://static.hupo.com/expdomain_myadmin/2018-11-19%EF%BC%88%E5%9B%BD%E9%99%85%E5%9F%9F%E5%90%8D%EF%BC%89.txt hd hupo-2018-11-19
000ffff0 74 75 64 69 65 73 2e 63 6f 6d 0d 0a 70 31 63 6f |tudies.com..p1co| 00100000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0018a5e0 00 00 00 00 00 00 00 00 00 |.........|
More generally, several files contain invalid domain names with non-ASCII characters, e.g. 2013-01-02 contains365<D3>л<FA><C2><CC>.com
. Domain names can only contain ASCII charters: stackoverflow.com/questions/1133424/what-are-the-valid-characters-that-can-show-up-in-a-url-host Maybe we should get rid of any such lines as noise.Some files around 2011-09-06 start with an empty line. 2014-01-15 starts with about twenty empty lines. Oh and that last one also has some trash bytes the end<B7><B5><BB><D8>
. Beauty. - webmasterhome.cn: e.g. domain.webmasterhome.cn/com/2012-03-06.asp. Appears to contain the exact same data as "static.hupo.com"Also heavily IP throttled, and a bit more than hupo apparently.Also has some randomly missing dates like hupo.com, though different missing ones from hupo, so they complement each other nicely.Some of the URLs are broken and don't inform that with HTTP status code, they just replace the results with some Chinese text 无法找到该页 (The requested page could not be found):Several URLs just return length 0 content, e.g.:It is not fully clear if this is a throttling mechanism, or if the data is just missing entirely.
curl -vvv http://domain.webmasterhome.cn/com/2015-10-31.asp * Trying 125.90.93.11:80... * Connected to domain.webmasterhome.cn (125.90.93.11) port 80 (#0) > GET /com/2015-10-31.asp HTTP/1.1 > Host: domain.webmasterhome.cn > User-Agent: curl/7.88.1 > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 21 Oct 2023 15:12:23 GMT < Server: Microsoft-IIS/6.0 < X-Powered-By: ASP.NET < Content-Length: 0 < Content-Type: text/html < Set-Cookie: ASPSESSIONIDCSTTTBAD=BGGPAONBOFKMMFIPMOGGHLMJ; path=/ < Cache-control: private < * Connection #0 to host domain.webmasterhome.cn left intact
Starting around 2018, the IP limiting became very intense, 30 mins / 1 hour per URL, so we just gave up. Therefore, data from 2018 onwards does not contain webmasterhome.cn data.Starting from2013-05-10
the format changes randomly. This also shows us that they just have all the HTML pages as static files on their server. E.g. with:we see:grep -a '<pre' * | s
2013-05-09:<pre style='font-family:Verdana, Arial, Helvetica, sans-serif; '><strong>2013<C4><EA>05<D4><C2>09<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong><br>0-3y.com 2013-05-10:<pre><strong>2013<C4><EA>05<D4><C2>10<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong>
- justdropped.com: e.g. www.justdropped.com/drops/030612com.html
- yoid.com: e.g.: yoid.com/bydate.php?d=2016-06-03&a=a
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
wait
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.sh
The extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
- archive.org/details/expired-domain-names-by-day
- github.com/cirosantilli/expired-domain-names-by-day-* repos:
- github.com/cirosantilli/expired-domain-names-by-day-2011 (~11M)
- github.com/cirosantilli/expired-domain-names-by-day-2012 (~18M)
- github.com/cirosantilli/expired-domain-names-by-day-2013 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2014 (~29M)
- github.com/cirosantilli/expired-domain-names-by-day-2015 (~28M)
- github.com/cirosantilli/expired-domain-names-by-day-2016
- github.com/cirosantilli/expired-domain-names-by-day-2017
- github.com/cirosantilli/expired-domain-names-by-day-2018
- github.com/cirosantilli/expired-domain-names-by-day-2019
- github.com/cirosantilli/expired-domain-names-by-day-2020
- github.com/cirosantilli/expired-domain-names-by-day-2021
- github.com/cirosantilli/expired-domain-names-by-day-2022
- GitHub trackers:
- admin-monitor.shiyue.com
- anquan.didichuxing.com
- app.cloudsek.com
- app.flare.io
- app.rainforest.tech
- app.shadowmap.com
- bo.serenety.xmco.fr 8 1
- bts.linecorp.com
- burn2give.vercel.app
- cbs.ctm360.com 17 2
- code6.d1m.cn
- code6-ops.juzifenqi.com
- codefend.devops.cndatacom.com
- dlp-code.airudder.com
- easm.atrust.sangfor.com
- ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
- ecall.beygoo.me 2 1
- eos.vip.vip.com 1 1
- foradar.baimaohui.net 2 1
- fty.beygoo.me
- hive.telefonica.com.br 2 1
- hulrud.tistory.com
- kartos.enthec.com
- soc.futuoa.com
- lullar-com-3.appspot.com
- penetration.houtai.io 2 1
- platform.sec.corp.qihoo.net
- plus.k8s.onemt.co 4 1
- pmp.beygoo.me 2 1
- portal.protectorg.com
- qa-boss.amh-group.com
- saicmotor.saas.cubesec.cn
- scan.huoban.com
- sec.welab-inc.com
- security.ctrip.com 10 3
- siem-gs.int.black-unique.com 2 1
- soc-github.daojia-inc.com
- spigotmc.org 2 1
- tcallzgroup.blueliv.com
- tcthreatcompass05.blueliv.com 4 1
- tix.testsite.woa.com 2 1
- toucan.belcy.com 1 1
- turbo.gwmdevops.com 18 2
- urlscan.watcherlab.com
- zelenka.guru. Looks like a Russian hacker forum.
- LinkedIn profile views:
- "Information Security Specialist at Forcepoint"
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*
Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":produces per-year results for the regex term OK lets:
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019
news|global
between the years under:tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012
./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'
Other searches that are not dense enough for our patience:
world|global|[^.]info
OMG and a few more. It's amazing.
news
search might be producing some golden, golden new hits!!! Going full into this. Hits:- thepyramidnews.com
- echessnews.com
- tickettonews.com
- airuafricanews.com
- vuvuzelanews.com
- dayenews.com
- newsupdatesite.com
- arabicnewsonline.com
- arabicnewsunfiltered.com
- newsandsportscentral.com
- networkofnews.com
- trekkingtoday.com
- financial-crisis-news.com
This is a good company, first they truly helped reduce international transfer fees. They they continued to morph into a decent challenger bank.
Their Wise Interest account was amazing as of late 2023: wise.com/gb/interest/
Instant access with representative national interests and 0.29% fees.
Brick and mortar banks were way way behind in that regard!
E.g. October 2023, Wise was doing 4.87% interest after fees, while Barclay's best option was 1.16% above 5k pounds on the Rainy Day Saver (5% below). Ridiculous!
Update: On November 2023 unfortunately they more than doubled their fees from 0.19% to 0.46%:but it still was a good option to keep cash in.
Possible hits that are too broken to be sure by Ciro Santilli 35 Updated 2025-01-10 +Created 1970-01-01
Likely hits possible but whose archives is too broken to be easily certain. If:were to ever be found, these would be considered hits.
- nearby IP hits
- proper reverse engineering of their comms if any, or any other page fingerprints
- 216.97.231.56 nouvelles-d-aujourdhuis.com. 2011. Stylistically perfect, but no nearby IP hits. Maybe looking into HTML would help confirm:But wrong IP? likely CGI comms variant under the signup page: web.archive.org/web/20090405045548/http://nouvelles-d-aujourdhuis.com/members.html.
rss-items
Tested viewdns.info range: 216.97.231.46 - 216.97.231.66. Not a single reverse IP hit in there.viewdns.info also assigns it 50.63.202.46, GoDaddy.com, LLC, 2013-11-08 in addition to 216.97.231.56, Canada, IPXO LLC, 2013-09-06. This is very near other iranfootballsource.com flukes, so likely useless.securitytrails.com also gives it one earlier IP 209.200.240.250 last seen 2008-09-20: securitytrails.com/domain/nouvelles-d-aujourdhuis.com/history/a Hydra Communications Ltd before 216.97.231.56 "ASU doctor" first seen 2008-09-20 (15 years)> Tested viewdns.info range: 209.200.240.240 - 209.200.240.260 empty at the time of interest.Marked copyright 2006, so mega early.
africainnews.com
- no archives of the HTML
- SWF. A reverse engineering of the SWF should be able to confirm.
- web.archive.org/web/20111007194814/http://africainnews.com/robots.txt
- dnshistory.org/historical-dns-records/a/africainnews.com
- 2009-12-29 -> 2010-07-28 72.167.232.43. Tested viewdns.info range: 72.167.232.33 - 72.167.232.53. Several virtual hosts there.
- 2011-10-14 -> 2011-10-14 68.178.232.100 virtual
- 2012-08-12 -> 2012-08-12 97.74.42.79. Tested viewdns.info range: 97.74.42.69 - 97.74.42.89
- 97.74.42.74: landtex.net 2023-03-22
- 97.74.42.76: solidasshonky.com 2023-03-07
- 97.74.42.77: solidasshonky.com 2023-03-07
- 97.74.42.78: blakebrothers.co 2018-05-05
- 97.74.42.78: learningjbe.com 2023-02-02
- 97.74.42.78: solidasshonky.com 2023-03-07
- 97.74.42.78: sourceuae.com 2023-03-07
- 97.74.42.78: superiorfoodservicesales.com 2017-09-10
- 97.74.42.79: large virtual
- 97.74.42.80: waiasialtd.com 2016-10-17
- viewdns.info/iphistory/?domain=africainnews.com
- 50.63.202.92 United States AS-26496-GO-DADDY-COM-LLC 2013-06-30. Likely large virtual.
- 97.74.42.79 United States AS-26496-GO-DADDY-COM-LLC 2013-05-20. tested.
- 68.178.232.100 United States AS-26496-GO-DADDY-COM-LLC 2012-06-29 virtual
- 68.178.232.99 United States AS-26496-GO-DADDY-COM-LLC 2011-11-13
- 68.178.232.100 United States AS-26496-GO-DADDY-COM-LLC 2011-10-09 virtual
- 72.167.232.43 United States GO-DADDY-COM-LLC 2011-09-08. Tested.
globalsentinelsite.com
- dnshistory.org/historical-dns-records/a/globalsentinelsite.com 2010-02-13 -> 2010-08-04 74.124.210.249 unknown
- viewdns.info/iphistory/?domain=globalsentinelsite.com 74.124.210.249 United States INMOTION 2011-11-13 unknownviewdns.info/reverseip/?host=74.124.210.249&t=1 has 347 hits
todaysolar.com. This might just be legit, but keeping it around just in case.
2011 | JAR | dnshistory.org/historical-dns-records/a/todaysolar.com 2009-08-11 -> 2011-03-01 74.208.62.112 unknown | viewdns.info/iphistory/?domain=todaysolar.com 74.208.62.112 United States PROFITBRICKS-USA 2012-11-12 |
Unlisted articles are being shown, click here to show only listed articles.