Scrapped justdropped data, patched:
+++ b/cia-2010-covert-communication-websites/cdx-post.sh
@@ -1,7 +1,7 @@
 #!/usr/bin/env bash
 # Post process the output of cdx.sh to enrich IDs even further, and reconstruct easier to Web Archive inspect domain names.
-grep -P -e '([^,)]+)\)\/\1\.swf|\)/[^/]+.jar|([^,)]+),([^,)]+),([^,)]+)\)/cgi-bin/[^/]+\.cgi' "$1" |
-  sed -r 's/\).*//' | awk -F, '{ printf("%s.%s\n", $2, $1) }' | uniq -c | awk '$1 == 1{ print $2 }' | tee $1.post
+grep -P -e '([^,)]+)\)\/\1\.swf|\)/[^/]+.jar|([^,)]+),([^,)]+),([^,)]+)\)/cgi-bin/[^/]+\.cgi' "$1"|
+  sed -r 's/\).*//' | awk -F, '{ printf("%s.%s\n", $2, $1) }' | uniq -c | awk '{ print $2 }' | tee $1.post
and then:
./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today' 2006 2022
web.archive.org/web/20110203041325/http://financecentraltoday.com/
web.archive.org/web/20110202221328/http://thenewsofpakistan.com/
web.archive.org/web/20050424123432/http://www.pokernewsweb.com/ likely legit in the intended emulated style
web.archive.org/web/20100923090646/http://mideasttoday.net/
web.archive.org/web/20100206221718/http://euronewsonline.net/
web.archive.org/web/20110208063146/http://news-and-sports.com/ Hit.
web.archive.org/web/20110202054628/http://intoworldnews.com/ hit.
web.archive.org/web/20110207171340/http://mydailynewsreport.com/ hit
web.archive.org/web/20050508220858/http://www.asianewsupdate.com/ this looks like the exact format of legitimate site the CIA was emulating. Copyright 2005, a CGI link to as: www.asianewsupdate.com:80/cgi-sys/FormMail.cgi There's a phone there 01 647-0910 so seems less likely?
2010. JAR unarchived. rss, split image
2010. JAR. Split header.
2011. JAR unarchived. Split header.
2011. JAR. a.newslink, a.newslinkalt.
2011. Arabic. RSS.
web.archive.org/web/20110129115400/http://kmirano.com/ shallow but off style? Has a kmirano.sfw... viewdns.info/iphistory/?domain=kmirano.com says 211.1.224.71 Japan NTT SmartConnect Corporation 2012-01-11
2011. JAR. Copyright 2008. Split header and other images. They are obsessed about CDMA (2G).
2011. JAR. split header, RSS.
2010. Suspicious. But no clear fingrenprint. Also not as shallow as others. Also Joomla based which would be novel.
2010. JAR.
newspapergateway.com/ web.archive.org/web/20110208070309/http://newspapergateway.com/ hard to tell but generally off. Has both JAR and SWF.
2011 Farsi. JAR. RSS.
2010 JAR. Split header, rss.
2011. English. Split header, RSS.
sandstormnews.com 2011, SWF Arabic. ul.rss-items > li.rss-item, split header
zerosandonesnews.com 2011. SWF Split header, ul.rss-items > li.rss-item
lasthournews.com web.archive.org/web/20100513182623/http://lasthournews.com/. Urdu. JAR at: web.archive.org/web/20100513182724/http://lasthournews.com/recent.jar. Split header images.
mynepalnews.com, split header images, ul.rss-items > li.rss-item, Unarchived jar:
Announcements and updates by self:
Pings by self:
Reactions by others:
Notable reactions to the websites themselves:
AI generated porn by Ciro Santilli 37 Updated 2025-07-16
This is going to be the most important application of generative AI. Especially if we ever achieve good text-to-video.
Image generators plus human ranking:
www.pornhub.com/view_video.php?viewkey=ph63c71351edece: Heavenly Bodies Part 1: Sister's Mary First Act. Pornhub title: "AI generated Hentai Story: Sexy Nun alternative World(Isekai) Stable Diffusion" Interesting concept, slide-narrated over visual novel. The question is how they managed to keep face consistency across images.
Git design rationale by Ciro Santilli 37 Updated 2025-07-16
The fundamental insight of Git design is: a SHA represents not only current state, but also the full history due to the Merkle tree implementation, see notably:
This makes it so that you will always notice if you are overwriting history on the remote, even if you are developing from two separate local computers (or more commonly, two people in two different local computers) and therefore will never lose any work accidentally.
It is very hard to achieve that without the Merkle tree.
Consider for example the most naive approach possible of marking versions with consecutive numbers:
  • Local 1:
  • Local 2:
    • 0: root commit
    • 1: commit 1
    • 2: commit 2 by local 2
    • 3: commit 3 by local 2
  • Remote
If Local 1 were to push to Remote first, how could Local 2 notice that when it tries to push itself? The navie method of just checking: "does Remote have commit "2"" does not work, because Local 2 has a different version of commit 2 than local 1.
ImageNet subset by Ciro Santilli 37 Updated 2025-07-16
Subset generators:
Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!

There are unlisted articles, also show them or only show them.