activatedgeek/LeNet-5 use ONNX for inference Updated +Created
Now let's try and use the trained ONNX file for inference on some manually drawn images on GIMP:
Note that the images must be drawn with white on black. If you use black on white, it the accuracy becomes terrible. This is a good very example of brittleness in AI systems!
Figure 1.
Number 9 drawn with mouse on GIMP by Ciro Santilli (2023)
We can try the code adapted from thenewstack.io/tutorial-using-a-pre-trained-onnx-model-for-inferencing/ at python/onnx_cheat/infer_mnist.py:
cd python/onnx_cheat
./infer_mnist.py lenet.onnx infer_mnist_9.png
and it works pretty well! The protram outputs:
9
as desired.
We can also try with images directly from Extract MNIST images.
for f in /home/ciro/git/mnist_png/out/testing/1/*.png; do echo $f; infer.py $f ; done
and the accuracy is great as expected.
Cat qubit Updated +Created
Amazon EC2 HOWTO Updated +Created
vCPU Updated +Created
AMDGPU Updated +Created
Abelian an non abelian anyons Updated +Created
apport-cli Updated +Created
Astera Institute person Updated +Created
Expired domain trackers Updated +Created
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:
  • hupo.com: e.g. static.hupo.com/expdomain_myadmin/2012-03-06(国际域名).txt. Heavily IP throttled. Tor hindered more than helped.
    Scraping script: cia-2010-covert-communication-websites/hupo.sh. Scraping does about 1 day every 5 minutes relatively reliably, so about 36 hours / year. Not bad.
    Results are stored under tmp/humo/<day>.
    Check for hit overlap:
    grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/hupo/*
    The hits are very well distributed amongst days and months, at least they did a good job hiding these potential timing fingerprints. This feels very deliberately designed.
    There are lots of hits. The data set is very inclusive. Also we understand that it must have been obtains through means other than Web crawling, since it contains so many of the hits.
    Nice output format for scraping as the HTML is very minimal
    They randomly changed their URL format to remove the space before the .com after 2012-02-03:
    Some of their files are simply missing however unfortunately, e.g. neither of the following exist:webmasterhome.cn did contain that one however: domain.webmasterhome.cn/com/2012-07-01.asp. Hmm. we might have better luck over there then?
    2018-11-19 is corrupt in a new and wonderful way, with a bunch of trailing zeros:
    wget -O hupo-2018-11-19 'http://static.hupo.com/expdomain_myadmin/2018-11-19%EF%BC%88%E5%9B%BD%E9%99%85%E5%9F%9F%E5%90%8D%EF%BC%89.txt
    hd hupo-2018-11-19
    ends in:
    000ffff0  74 75 64 69 65 73 2e 63  6f 6d 0d 0a 70 31 63 6f  |tudies.com..p1co|
    00100000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    0018a5e0  00 00 00 00 00 00 00 00  00                       |.........|
    More generally, several files contain invalid domain names with non-ASCII characters, e.g. 2013-01-02 contains 365<D3>л<FA><C2><CC>.com. Domain names can only contain ASCII charters: stackoverflow.com/questions/1133424/what-are-the-valid-characters-that-can-show-up-in-a-url-host Maybe we should get rid of any such lines as noise.
    Some files around 2011-09-06 start with an empty line. 2014-01-15 starts with about twenty empty lines. Oh and that last one also has some trash bytes the end <B7><B5><BB><D8>. Beauty.
  • webmasterhome.cn: e.g. domain.webmasterhome.cn/com/2012-03-06.asp. Appears to contain the exact same data as "static.hupo.com"
    Also heavily IP throttled, and a bit more than hupo apparently.
    Also has some randomly missing dates like hupo.com, though different missing ones from hupo, so they complement each other nicely.
    Some of the URLs are broken and don't inform that with HTTP status code, they just replace the results with some Chinese text 无法找到该页 (The requested page could not be found):
    Several URLs just return length 0 content, e.g.:
    curl -vvv http://domain.webmasterhome.cn/com/2015-10-31.asp
    *   Trying 125.90.93.11:80...
    * Connected to domain.webmasterhome.cn (125.90.93.11) port 80 (#0)
    > GET /com/2015-10-31.asp HTTP/1.1
    > Host: domain.webmasterhome.cn
    > User-Agent: curl/7.88.1
    > Accept: */*
    > 
    < HTTP/1.1 200 OK
    < Date: Sat, 21 Oct 2023 15:12:23 GMT
    < Server: Microsoft-IIS/6.0
    < X-Powered-By: ASP.NET
    < Content-Length: 0
    < Content-Type: text/html
    < Set-Cookie: ASPSESSIONIDCSTTTBAD=BGGPAONBOFKMMFIPMOGGHLMJ; path=/
    < Cache-control: private
    < 
    * Connection #0 to host domain.webmasterhome.cn left intact
    It is not fully clear if this is a throttling mechanism, or if the data is just missing entirely.
    Starting around 2018, the IP limiting became very intense, 30 mins / 1 hour per URL, so we just gave up. Therefore, data from 2018 onwards does not contain webmasterhome.cn data.
    Starting from 2013-05-10 the format changes randomly. This also shows us that they just have all the HTML pages as static files on their server. E.g. with:
    grep -a '<pre' * | s
    we see:
    2013-05-09:<pre style='font-family:Verdana, Arial, Helvetica, sans-serif; '><strong>2013<C4><EA>05<D4><C2>09<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong><br>0-3y.com
    2013-05-10:<pre><strong>2013<C4><EA>05<D4><C2>10<C8>յ<BD><C6>ڹ<FA><BC><CA><D3><F2><C3><FB></strong>
  • justdropped.com: e.g. www.justdropped.com/drops/030612com.html
  • yoid.com: e.g.: yoid.com/bydate.php?d=2016-06-03&a=a
This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
wait
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.sh
The extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
  • GitHub trackers:
    • admin-monitor.shiyue.com
    • anquan.didichuxing.com
    • app.cloudsek.com
    • app.flare.io
    • app.rainforest.tech
    • app.shadowmap.com
    • bo.serenety.xmco.fr 8 1
    • bts.linecorp.com
    • burn2give.vercel.app
    • cbs.ctm360.com 17 2
    • code6.d1m.cn
    • code6-ops.juzifenqi.com
    • codefend.devops.cndatacom.com
    • dlp-code.airudder.com
    • easm.atrust.sangfor.com
    • ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
    • ecall.beygoo.me 2 1
    • eos.vip.vip.com 1 1
    • foradar.baimaohui.net 2 1
    • fty.beygoo.me
    • hive.telefonica.com.br 2 1
    • hulrud.tistory.com
    • kartos.enthec.com
    • soc.futuoa.com
    • lullar-com-3.appspot.com
    • penetration.houtai.io 2 1
    • platform.sec.corp.qihoo.net
    • plus.k8s.onemt.co 4 1
    • pmp.beygoo.me 2 1
    • portal.protectorg.com
    • qa-boss.amh-group.com
    • saicmotor.saas.cubesec.cn
    • scan.huoban.com
    • sec.welab-inc.com
    • security.ctrip.com 10 3
    • siem-gs.int.black-unique.com 2 1
    • soc-github.daojia-inc.com
    • spigotmc.org 2 1
    • tcallzgroup.blueliv.com
    • tcthreatcompass05.blueliv.com 4 1
    • tix.testsite.woa.com 2 1
    • toucan.belcy.com 1 1
    • turbo.gwmdevops.com 18 2
    • urlscan.watcherlab.com
    • zelenka.guru. Looks like a Russian hacker forum.
  • LinkedIn profile views:
    • "Information Security Specialist at Forcepoint"
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*
Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019
produces per-year results for the regex term news|global between the years under:
tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012
OK lets:
./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'
Other searches that are not dense enough for our patience:
world|global|[^.]info
OMG news search might be producing some golden, golden new hits!!! Going full into this. Hits:
  • thepyramidnews.com
  • echessnews.com
  • tickettonews.com
  • airuafricanews.com
  • vuvuzelanews.com
  • dayenews.com
  • newsupdatesite.com
  • arabicnewsonline.com
  • arabicnewsunfiltered.com
  • newsandsportscentral.com
  • networkofnews.com
  • trekkingtoday.com
  • financial-crisis-news.com
and a few more. It's amazing.
Duke ARTIQ extensions Updated +Created
Electron configuration Updated +Created
Bitcoin daemon Updated +Created
Runs just a headless Bitcoin server.
You can then interact with it via the Bitcoin CLI client.
On Bitcoin Core snap 26.0, the executable is called bitcoin-core.daemon rather than bitcoind
OpenWorm Updated +Created
High level simulation only, no way to get from DNA to worm! :-) Includes:
3D body viewer at: browser.openworm.org/ TODO can you click on a cell to get its name?
Video 1.
OpenWorm Sibernetic demo by Mike Vella (2013)
Source. Sibernetic adds a fluid dynamics solver for brain-in-the-loop simulation of C. elegans.
Molecule Updated +Created
Covalent bond Updated +Created
Octet rule Updated +Created
Reuters article Updated +Created
This is our primary data source, the first article that pointed out a few specific CIA websites which then served as the basis for all of our research.
We take the truth of this article as an axiom. And then all we claim is that all other websites found were made by the same people due to strong shared design principles of the such websites.
Wayback Machine Updated +Created
D'oh.
But to be serious. The Wayback Machine contains a very large proportion of all sites. It is the most complete database we have found so far. Some archives are very broken. But those are rares.
The only problem with the Wayback Machine is that there is no known efficient way to query its archives across domains. You have to have a domain in hand for CDX queries: Wayback Machine CDX scanning.
The Common Crawl project attempts in part to address this lack of querriability, but we haven't managed to extract any hits from it.
CDX + 2013 DNS Census + heuristics however has been fruitful however.
File signature Updated +Created
Electron configuration notation Updated +Created
We will sometimes just write them without superscript, as it saves typing and is useless.

Unlisted articles are being shown, click here to show only listed articles.