ns.csv is 57 GB. This file is too massive, working with it is a pain.
We can also cut down the data a lot with stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column/76605540#76605540 and tld filtering:This brings us down to a much more manageable 3.0 GB, 83 M rows.
awk -F, 'BEGIN{OFS=","} { if ($1 != last) { print $1, $3; last = $1; } }' ns.csv | grep -E '\.(com|net|info|org|biz),' > nsu.csv
Let's just scan it once real quick to start with, since likely nothing will come of this venue:As of 267 hits we get:so yeah, most of those are likely going to be humongous just by looking at the names.
grep -f <(awk -F, 'NR>1{print $2}' ../media/cia-2010-covert-communication-websites/hits.csv) nsu.csv | tee nsu-hits.csv
cat nsu-hits.csv | csvcut -c 2 | sort | awk -F. '{OFS="."; print $(NF-1), $(NF)}' | sort | uniq -c | sort -k1 -n
1 a2hosting.com
1 amerinoc.com
1 ayns.net
1 dailyrazor.com
1 domainingdepot.com
1 easydns.com
1 frienddns.ru
1 hostgator.com
1 kolmic.com
1 name-services.com
1 namecity.com
1 netnames.net
1 tonsmovies.net
1 webmailer.de
2 cashparking.com
55 worldnic.com
86 domaincontrol.com
The smallest ones by far from the total are: frienddns.ru with only 487 hits, all others quite large or fake hits due to CSV. Did a quick Wayback Machine CDX scanning there but no luck alas.
Let's check the smaller ones:Doubt anything will come out of this.
inews-today.com,2013-08-12T03:14:01,ns1.frienddns.ru
source-commodities.net,2012-12-13T20:58:28,ns1.namecity.com -> fake hit due to grep e-commodities.net
dailynewsandsports.com,2013-08-13T08:36:28,ns3.a2hosting.com
just-kidding-news.com,2012-02-04T07:40:50,jns3.dailyrazor.com
fightwithoutrules.com,2012-11-09T01:17:40,sk.s2.ns1.ns92.kolmic.com
fightwithoutrules.com,2013-07-01T22:46:23,ns1625.ztomy.com
half-court.net,2012-09-10T09:49:15,sk.s2.ns1.ns92.kolmic.com
half-court.net,2013-07-07T00:31:12,ns1621.ztomy.com
Let's do a bit of counting out of the total:gives ~20M domain using so it accounts for 1/4 of the total.
grep domaincontrol.com ns.csv | awk -F, '{print $1}' | uniq | wc
domaincontrol
. Let's see how many domains are in the first place:awk -F, '{print $1}' ns.csv | uniq | wc
Same as 2013 DNS census NS records basically, nothing came out.
Hostprobes quick look on two ranges:
208.254.40:
... similar down
208.254.40.95 1334668500 down no-response
208.254.40.95 1338270300 down no-response
208.254.40.95 1338839100 down no-response
208.254.40.95 1339361100 down no-response
208.254.40.95 1346391900 down no-response
208.254.40.96 1335806100 up unknown
208.254.40.96 1336979700 up unknown
208.254.40.96 1338840900 up unknown
208.254.40.96 1339454700 up unknown
208.254.40.96 1346778900 up echo-reply (0.34s latency).
208.254.40.96 1346838300 up echo-reply (0.30s latency).
208.254.40.97 1335840300 up unknown
208.254.40.97 1338446700 up unknown
208.254.40.97 1339334100 up unknown
208.254.40.97 1346658300 up echo-reply (0.26s latency).
... similar up
208.254.40.126 1335708900 up unknown
208.254.40.126 1338446700 up unknown
208.254.40.126 1339330500 up unknown
208.254.40.126 1346494500 up echo-reply (0.24s latency).
208.254.40.127 1335840300 up unknown
208.254.40.127 1337793300 up unknown
208.254.40.127 1338853500 up unknown
208.254.40.127 1346454900 up echo-reply (0.23s latency).
208.254.40.128 1335856500 up unknown
208.254.40.128 1338200100 down no-response
208.254.40.128 1338749100 down no-response
208.254.40.128 1339334100 down no-response
208.254.40.128 1346607900 down net-unreach
208.254.40.129 1335699900 up unknown
... similar down
Suggests exactly 127 - 96 + 1 = 31 IPs.
208.254.42:
... similar down
208.254.42.191 1334522700 down no-response
208.254.42.191 1335276900 down no-response
208.254.42.191 1335784500 down no-response
208.254.42.191 1337845500 down no-response
208.254.42.191 1338752700 down no-response
208.254.42.191 1339332300 down no-response
208.254.42.191 1346499900 down net-unreach
208.254.42.192 1334668500 up unknown
208.254.42.192 1336808700 up unknown
208.254.42.192 1339334100 up unknown
208.254.42.192 1346766300 up echo-reply (0.40s latency).
208.254.42.193 1335770100 up unknown
208.254.42.193 1338444900 up unknown
208.254.42.193 1339334100 up unknown
... similar up
208.254.42.221 1346517900 up echo-reply (0.19s latency).
208.254.42.222 1335708900 up unknown
208.254.42.222 1335708900 up unknown
208.254.42.222 1338066900 up unknown
208.254.42.222 1338747300 up unknown
208.254.42.222 1346872500 up echo-reply (0.27s latency).
208.254.42.223 1335773700 up unknown
208.254.42.223 1336949100 up unknown
208.254.42.223 1338750900 up unknown
208.254.42.223 1339334100 up unknown
208.254.42.223 1346854500 up echo-reply (0.13s latency).
208.254.42.224 1335665700 down no-response
208.254.42.224 1336567500 down no-response
208.254.42.224 1338840900 down no-response
208.254.42.224 1339425900 down no-response
208.254.42.224 1346494500 down time-exceeded
... similar down
Suggests exactly 223 - 192 + 1 = 31 IPs.
Let's have a look at the file
68
: outcome: no clear hits like on 208. One wonders why.It does appears that long sequences of ranges are a sort of fingerprint. The question is how unique it would be.
First:This reduces us to 2 million IP rows from the total possible 16 million IPs.
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq
t=$n-up-uniq.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq tmp"
time sqlite3 $t 'create table t (i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'
OK now just counting hits on fixed windows has way too many results:
sqlite3 208-up-uniq.sqlite "\
SELECT * FROM (
SELECT min(i), COUNT(*) OVER (
ORDER BY i RANGE BETWEEN 15 PRECEDING AND 15 FOLLOWING
) as c FROM t
) WHERE c > 20 and c < 30
"
Let's try instead consecutive ranges of length exactly 31 instead then:271. Hmm. A bit more than we'd like...
sqlite3 208-up-uniq.sqlite <<EOF
SELECT f, t - f as c FROM (
SELECT min(i) as f, max(i) as t
FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t)
GROUP BY grp
ORDER BY i
) where c = 31
EOF
Another route is to also count the ups:
n=208
time awk '$3=="up"{ print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-cnt
t=$n-up-uniq-cnt.sqlite
rm -f $t
time sqlite3 $t 'create table tmp(cnt text, i text)'
time sqlite3 $t ".import --csv $n-up-uniq-cnt tmp"
time sqlite3 $t 'create table t (cnt integer, i integer)'
time sqlite3 $t '.load ./ip' 'insert into t select cnt as integer, str2ipv4(i) from tmp'
time sqlite3 $t 'drop table tmp'
time sqlite3 $t 'create index ti on t(i)'
Let's see how many consecutives with counts:
sqlite3 208-up-uniq-cnt.sqlite <<EOF
SELECT f, t - f as c FROM (
SELECT min(i) as f, max(i) as t
FROM (SELECT i, ROW_NUMBER() OVER (ORDER BY i) - i as grp FROM t WHERE cnt >= 3)
GROUP BY grp
ORDER BY i
) where c > 28 and c < 32
EOF
Let's check on 66:not representative at all... e.g. several convfirmed hits are down:
grep -e '66.45.179' -e '66.45.179' 66
66.45.179.215 1335305700 down no-response
66.45.179.215 1337579100 down no-response
66.45.179.215 1338765300 down no-response
66.45.179.215 1340271900 down no-response
66.45.179.215 1346813100 down no-response
Let's check relevancy of known hits:Output:
grep -e '208.254.40' -e '208.254.42' 208 | tee 208hits
208.254.40.95 1355564700 unreachable
208.254.40.95 1355622300 unreachable
208.254.40.96 1334537100 alive, 36342
208.254.40.96 1335269700 alive, 17586
..
208.254.40.127 1355562900 alive, 35023
208.254.40.127 1355593500 alive, 59866
208.254.40.128 1334609100 unreachable
208.254.40.128 1334708100 alive from 208.254.32.214, 43358
208.254.40.128 1336596300 unreachable
The rest of 208 is mostly unreachable.
208.254.42.191 1335294900 unreachable
...
208.254.42.191 1344737700 unreachable
208.254.42.191 1345574700 Icmp Error: 0,ICMP Network Unreachable, from 63.111.123.26
208.254.42.191 1346166900 unreachable
...
208.254.42.191 1355665500 unreachable
208.254.42.192 1334625300 alive, 6672
...
208.254.42.192 1355658300 alive, 57412
208.254.42.193 1334677500 alive, 28985
208.254.42.193 1336524300 unreachable
208.254.42.193 1344447900 alive, 8934
208.254.42.193 1344613500 alive, 24037
208.254.42.193 1344806100 alive, 20410
208.254.42.193 1345162500 alive, 10177
...
208.254.42.223 1336590900 alive, 23284
...
208.254.42.223 1355555700 alive, 58841
208.254.42.224 1334607300 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224 1334681100 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224 1336563900 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.142
208.254.42.224 1344451500 Icmp Type: 11,ICMP Time Exceeded, from 65.214.56.138
208.254.42.224 1344566700 unreachable
208.254.42.224 1344762900 unreachable
Let's try with 66. First there way too much data, 9 GB, let's cut it down:
n=66
time awk '$3~/^alive,/ { print $1 }' $n | uniq -c | sed -r 's/^ +//;s/ /,/' | tee $n-up-uniq-c
OK down to 45 MB, now we can work.
grep -e '66.45.179' -e '66.104.169' -e '66.104.173' -e '66.104.175' -e '66.175.106' '66-alive-uniq-c' | tee 66hits
Nah, it's full of holes:won't be able to find new ranges here.
4,66.45.179.187
12,66.45.179.188
2,66.45.179.197
1,66.45.179.202
2,66.45.179.205
2,66.45.179.206
1,66.45.179.207
There are two types of JavaScript found so far. The ones with SHA and the ones without. There are only 2 examples of JS with SHA:Both files start with precisely the same string:
- iraniangoals.com: web.archive.org/web/20110202091909/http://iraniangoals.com/journal.js Commented at: iraniangoals.com JavaScript reverse engineering
- iranfootballsource.com: web.archive.org/web/20110202091901/http://iranfootballsource.com/futbol.js
- kukrinews.com: web.archive.org/web/20100513094909/http://kukrinews.com/news.js
- todaysnewsandweather-ru.com: web.archive.org/web/20110207094735/http://todaysnewsandweather-ru.com/blacksea.js
var ms="\u062F\u0631\u064A\u0627\u0641\u062A\u06CC",lc="\u062A\u0647\u064A\u0647 \u0645\u062A\u0646",mn="\u0628\u0631\u062F\u0627\u0632\u0634 \u062F\u0631 \u062C\u0631\u064A\u0627\u0646 \u0627\u0633\u062A...\u0644\u0637\u0641\u0627 \u0635\u0628\u0631 \u0643\u0646\u064A\u062F",lt="\u062A\u0647\u064A\u0647 \u0645\u062A\u0646",ne="\u067E\u0627\u0633\u062E",kf="\u062E\u0631\u0648\u062C",mb="\u062D\u0630\u0641",mv="\u062F\u0631\u064A\u0627\u0641\u062A\u06CC",nt="\u0627\u0631\u0633\u0627\u0644",ig="\u062B\u0628\u062A \u063A\u0644\u0637. \u062C\u0647\u062A \u062A\u062C\u062F\u064A\u062F \u062B\u0628\u062A \u0635\u0641\u062D\u0647 \u0631\u0627 \u0628\u0627\u0632\u0622\u0648\u0631\u06CC \u06A9\u0646\u064A\u062F",hs="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",ji="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",ie="\u063A\u064A\u0631 \u0642\u0627\u0628\u0644 \u0627\u062C\u0631\u0627. \u062E\u0637\u0627 \u062F\u0631 \u0627\u062A\u0651\u0635\u0627\u0644",gc="\u0633\u0648\u0627\u0631 \u06A9\u0631\u062F\u0646 \u062A\u06A9\u0645\u064A\u0644 \u0634\u062F",gz="\u0645\u0637\u0645\u0626\u0646\u064A\u062F \u06A9\u0647 \u0645\u064A\u062E\u0648\u0627\u0647\u064A\u062F \u067E\u064A\u0627\u0645 \u0631\u0627 \u062D\u0630\u0641 \u06A9\u0646\u064A\u062F\u061F"
Good fingerprint present in all of them:
throw new Error("B64 D.1");};if(at[1]==-1){throw new Error("B64 D.2");};if(at[2]==-1){if(f<ay.length){throw new Error("B64 D.3");};dg=2;}else if(at[3]==-1){if(f<ay.length){throw new Error("B64 D.4")
feedsdemexicoyelmundo.com JavaScript reverse engineering by Ciro Santilli 35 Updated 2024-12-23 +Created 1970-01-01
The JavaScript of each website appears to be quite small and similarly sized. They are all minimized, but have reordered things around a bit.
For example consider: web.archive.org/web/20110202190932/http://feedsdemexicoyelmundo.com/mundo.js
First we have to know that the Wayback Machine adds some stuff before and after the original code. The actual code there starts at:and ends in:
ap={fg:['MSXML2.XMLHTTP
ck++;};return fu;};
We can use a JavaScript beautifier such as beautifier.io/ to be abe to better read the code.
It is worth noting that there's a lot of
<script>
tags inline as well, which seem to matter.Further analysis would be needed.
Previously it was unclear if there were any .org hits, until we found the first one with clear comms: web.archive.org/web/20110624203548/http://awfaoi.org/hand.jar
Later on, two more clear ones were found with expired domain trackers:further settling their existence. Later on newimages.org also came to light.
- azerinews.org
- autism-news.org
Others that had been previously found in IP ranges but without clear comms:
- 65.61.127.177: material-science.org
- 212.4.17.61: tech-stop.org
Others in IP ranges by unarchived:
- 74.116.72.244 arborstribune.org
.org is very rare, and has been excluded from some of our search heuristics. That was a shame, but likely not much was missed.
The Wayback Machine has an endpoint to query cralwed pages called the CDX server. It is documented at: github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md.
This allows to filter down 10 thousands of possible domains in a few hours. But 100s of thousands would be too much. This is because you have to query exactly one URL at a time, and they possibly rate limit IPs. But no IP blacklisting so far after several hours, so it's not that bad.
Once you have a heuristic to narrow down some domains, you can use this helper: cia-2010-covert-communication-websites/cdx.sh to drill them down from 10s of thousands down to hundreds or thousands.
We then post process the results of cdx.sh with cia-2010-covert-communication-websites/cdx-post.sh to drill them down from from thousands to dozens, and manually inspect everything.
From then on, you can just manually inspect for hist on your browser.
Many hits appear to happen on the same days, and per-day data does exist: archive.org/details/widecrawl but apparently cannot be publicly downloaded unfortunately. But maybe there's another way? TODO select candidates.
At twitter.com/togelius/status/1328404390114435072 called out on DeepMind Lab2D for not giving them credit on prior work!As seen from web.archive.org/web/20220331022932/http://gvgai.net/ though, DeepMind sponsored them at some point.
This very much looks like like GVGAI which was first released in 2014, been used in dozens (maybe hundreds) of papers, and for which one of the original developers was Tom Schaul at DeepMind...
- SQLite with
rowid
: stackoverflow.com/questions/8190541/deleting-duplicate-rows-from-sqlite-database - SQL Server has crazy "CTEs" change backing table extension: stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server
The video from futureai.guru/technologies/brian-simulator-ii-open-source-agi-toolkit/ shows a demo of the possibly non open source version. They have a GUI neuron viewer and editor, which is kind of cool.
Principal investigator: Simon M. Lucas.
diff3
conflict is basically what you always want to see, either by setting it as the default as per stackoverflow.com/questions/27417656/should-diff3-be-default-conflictstyle-on-git:git config --global merge.conflictstyle diff3
git checkout --conflict=diff3
With this, conflicts now show up as:
++<<<<<<< HEAD
+5
++||||||| parent of 7b0f59d (6)
++3
++=======
+ 6
++>>>>>>> 7b0f59d (6)
7b0f59d
is the SHA-2 of commit 6.instead of the inferior default:
++<<<<<<< ours
+5
++=======
+ 6
++>>>>>>> theirs
We can also observe the current tree state during resolution:so we understand that we are now at 5 and that we are trying to apply our commit
* b4ec057 (HEAD, master) 5
* 0b37c1b 4
| * fbfbfe8 (my-feature) 7
| * 7b0f59d 6
|/
* 661cfab 3
* 6d748a9 2
* c5f8a2c 1
6
So it is much clearer what is happening:and so now we have to decide what the new code is that will put both of these together.
- master changed the code from
3
to5
- our feature changed the code from
3
to6
Let's say we decide it is
5 + 6 = 11
and continue rebasing:git add .
git rebase --continue
We now reach:and the tree looks like:So we understand that:and after resolving that one we now reach:
++<<<<<<< HEAD
+11
++||||||| parent of fbfbfe8 (7)
++6
++=======
+ 7
++>>>>>>> fbfbfe8 (7)
* ca7f7ff (HEAD) 6
* b4ec057 (master) 5
* 0b37c1b 4
| * fbfbfe8 (my-feature) 7
| * 7b0f59d 6
|/
* 661cfab 3
* 6d748a9 2
* c5f8a2c 1
- after the previous step we added commit 6 on top of 5
- now we are adding 7 on top of the new 6 (which we decided would contain
11
)
* e1aaf20 (HEAD -> my-feature) 7
* ca7f7ff 6
* b4ec057 (master) 5
* 0b37c1b 4
* 661cfab 3
* 6d748a9 2
* c5f8a2c 1
These are good free newbie GUI options:
sudo apt install meld
git mergetool --tool meld
sudo apt install kdiff3
git mergetool --tool kdiff3
Let's make a more interesting conflict:
git-tips-2.sh
#!/usr/bin/env bash
set -eux
add() (
rm -f f
for i in `seq 10`; do
printf "before $i\n\n" >> f
done
printf "conflict 1 $1\n\n" >> f
for i in `seq 10`; do
printf "middle $i\n\n" >> f
done
printf "conflict 2 $2\n\n" >> f
for i in `seq 10`; do
printf "after $i\n\n" >> f
done
git add f
)
rm -rf git-tips-2
mkdir git-tips-2
cd git-tips-2
git init
for i in 1 2 3; do
add $i $i
git commit -m $i
done
add 3 4
git commit -m 4
add 5 4
git commit -m 5
git checkout HEAD~2
git checkout -b my-feature
add 3 6
git commit -m 6
add 7 6
git commit -m 7
But which commit from master did we conflict with exactly? by Ciro Santilli 35 Updated 2024-12-23 +Created 1970-01-01
git rebase
does not tell you that, and that sucks.We only know which commit from the feature branch caused the problem.
Generally we can guess or it is not needed, but
imerge
does look promising: stackoverflow.com/questions/18162930/how-can-i-find-out-which-git-commits-cause-conflicts Move your branch on top of newest master by Ciro Santilli 35 Updated 2024-12-23 +Created 1970-01-01
Before:
5 master
|
4 7 my-feature HEAD
| |
3 6
|/
2
|
1
Action:
git rebase
After:Ready to push with linear history!
7 my-feature HEAD
|
6
|
5 master
|
4
|
3
|
2
|
1
Modify contents of an old commit in your branch by Ciro Santilli 35 Updated 2024-12-23 +Created 1970-01-01
Before:
7 my-feature HEAD
|
6
|
5 master
|
4
|
3
|
2
|
1
Oh, commit 6 was crap:
git rebase -i HEAD~2
Mark
6
to be modified.After:Better now, ready to push.
7 my-feature HEAD
|
6v2
|
5 master
|
4
|
3
|
2
|
1
Note: history changes change all commits SHAs. All parents are considereEven time is considered. So is commit message/author. And obviously file contents. So now commit "7" will actually have a different SHA.
There are unlisted articles, also show them or only show them.