Abelian an non abelian anyons Updated +Created
Amazon EC2 HOWTO Updated +Created
apport-cli Updated +Created
Astera Institute person Updated +Created
Cat qubit Updated +Created
CIA 2010 covert communication websites / Expired domain trackers Updated +Created
When you Google most of the hit domains, many of them show up on "expired domain trackers", and above all Chinese expired domain trackers for some reason, notably e.g.:
This suggests that scraping these lists might be a good starting point to obtaining "all expired domains ever".
Data comparison:
We've made the following pipelines for hupo.com + webmasterhome.cn merging:
./hupo.sh &
./webmastercn.sh &
./justdropped.sh &
wait
./justdropped-post.sh
./hupo-merge.sh
# Export as small Google indexable files in a Git repository.
./hupo-repo.sh
# Export as per year zips for Internet Archive.
./hupo-zip.sh
# Obtain count statistics:
./hupo-wc.sh
Count unique domains in the repos:
( echo */*/*/* | xargs cat ) | sort -u | wc
The extracted data is present at:Soon after uploading, these repos started getting some interesting traffic, presumably started by security trackers going "bling bling" on certain malicious domain names in their databases:
  • GitHub trackers:
    • admin-monitor.shiyue.com
    • anquan.didichuxing.com
    • app.cloudsek.com
    • app.flare.io
    • app.rainforest.tech
    • app.shadowmap.com
    • bo.serenety.xmco.fr 8 1
    • bts.linecorp.com
    • burn2give.vercel.app
    • cbs.ctm360.com 17 2
    • code6.d1m.cn
    • code6-ops.juzifenqi.com
    • codefend.devops.cndatacom.com
    • dlp-code.airudder.com
    • easm.atrust.sangfor.com
    • ec2-34-248-93-242.eu-west-1.compute.amazonaws.com
    • ecall.beygoo.me 2 1
    • eos.vip.vip.com 1 1
    • foradar.baimaohui.net 2 1
    • fty.beygoo.me
    • hive.telefonica.com.br 2 1
    • hulrud.tistory.com
    • kartos.enthec.com
    • soc.futuoa.com
    • lullar-com-3.appspot.com
    • penetration.houtai.io 2 1
    • platform.sec.corp.qihoo.net
    • plus.k8s.onemt.co 4 1
    • pmp.beygoo.me 2 1
    • portal.protectorg.com
    • qa-boss.amh-group.com
    • saicmotor.saas.cubesec.cn
    • scan.huoban.com
    • sec.welab-inc.com
    • security.ctrip.com 10 3
    • siem-gs.int.black-unique.com 2 1
    • soc-github.daojia-inc.com
    • spigotmc.org 2 1
    • tcallzgroup.blueliv.com
    • tcthreatcompass05.blueliv.com 4 1
    • tix.testsite.woa.com 2 1
    • toucan.belcy.com 1 1
    • turbo.gwmdevops.com 18 2
    • urlscan.watcherlab.com
    • zelenka.guru. Looks like a Russian hacker forum.
  • LinkedIn profile views:
Check for overlap of the merge:
grep -Fx -f <( jq -r '.[].host' ../media/cia-2010-covert-communication-websites/hits.json ) cia-2010-covert-communication-websites/tmp/merge/*
Next, we can start searching by keyword with Wayback Machine CDX scanning with Tor parallelization with out helper cia-2010-covert-communication-websites/hupo-cdx-tor.sh, e.g. to check domains that contain the term "news":
./hupo-cdx-tor.sh mydir 'news|global' 2011 2019
produces per-year results for the regex term news|global between the years under:
tmp/hupo-cdx-tor/mydir/2011
tmp/hupo-cdx-tor/mydir/2012
OK lets:
./hupo-cdx-tor.sh out 'news|headline|internationali|mondo|mundo|mondi|iran|today'
Other searches that are not dense enough for our patience:
world|global|[^.]info
OMG news search might be producing some golden, golden new hits!!! Going full into this. Hits:
  • thepyramidnews.com
  • echessnews.com
  • tickettonews.com
  • airuafricanews.com
  • vuvuzelanews.com
  • dayenews.com
  • newsupdatesite.com
  • arabicnewsonline.com
  • arabicnewsunfiltered.com
  • newsandsportscentral.com
  • networkofnews.com
  • trekkingtoday.com
  • financial-crisis-news.com
and a few more. It's amazing.
Football simulation Updated +Created
CIA 2010 covert communication websites / Wayback Machine Updated +Created
D'oh.
But to be serious. The Wayback Machine contains a very large proportion of all sites. It does happen sometime that a Wayback Machine archive is missing or broken and cqcounter has the screenshot. But the Wayback Machine is still the most complete database we have found so far. Some archives are very broken. But those are rare.
The only problem with the Wayback Machine is that there is no known efficient way to query its archives across domains. You have to have a domain in hand for CDX queries: Wayback Machine CDX scanning.
The Common Crawl project attempts in part to address this lack of querriability, but we haven't managed to extract any hits from it.
CDX + 2013 DNS Census + heuristics however has been fruitful however.
We have dumped all Wayback Machine archives of known websites to: github.com/cirosantilli/cia-2010-websites-dump using cia-2010-covert-communication-websites/download-websites.sh. This allows for better grepping and serves as a backup in case they ever go down.
Bitcoin daemon Updated +Created
Runs just a headless Bitcoin server.
You can then interact with it via the Bitcoin CLI client.
On Bitcoin Core snap 26.0, the executable is called bitcoin-core.daemon rather than bitcoind
CIA 2010 covert communication websites / Reuters article Updated +Created
This is our primary data source, the first article that pointed out a few specific CIA websites which then served as the basis for all of our research.
We take the truth of this article as an axiom. And then all we claim is that all other websites found were made by the same people due to strong shared design principles of the such websites.
General-purpose computing on graphics processing units Updated +Created
Covalent bond Updated +Created
Girdler sulfide process Updated +Created
Git tips / Merge conflicts Updated +Created
Google Books Updated +Created
They scanned a bunch of books, and then allowed search results to hit them. They then only show a small context around the hit to avoid copyright infringement.
Bibliography:
Human Connectome Project Updated +Created
Nuclear graphite Updated +Created
Pro Git book Updated +Created

Unlisted articles are being shown, click here to show only listed articles.