TODO: it would be cool to have something like bitcoinstrings.com but including the actual transactions:
TODO who owns it? Are they reliable?
This helper dumps a transaction JSON to a binary:
bitcoin-tx-out-scripts() (
    # Dump data contained in out scripts. Remove first 3 last 2 bytes of
    # standard transaction boilerplate.
    h="$1"
    echo curl "https://blockchain.info/tx/${h}?format=json" |
    jq '.out[].script' tmp.json |
    sed 's/"76a914//;s/88ac"//' |
    xxd -r -p > "${h}.bin"
)
Previously called "bitcoin-strings-with-txids" since text was the initial focus, but Ciro Santilli decided to go for the more general name once images became more and more important to the project.
Set of scripts b Ciro Santilli, primarily created while researching Cool data embedded in the Bitcoin blockchain.
bitcoinstrings.com has all strings -n20 strings, we can obtain the whole thing and clean it up a bit with:
wget -O all.html https://bitcoinstrings.com/all
cp all.html all-recode.html
recode html..ascii all-recode.html
awk '!seen[$0]++' all-recode.html > all-uniq.html
awk to skip the gazillion "mined by message" repeats.
A lot of in that website stuff appears to be cut up at the 20 mark. As shown in Force of Will, this is possibly because they didn't use -w in strings -n20, and the text after the newlines was less than 20 characters.
That website can be replicated by downloading the Bitcoin blockchain locally, then:
cd .bitcoin/blocks
for f in blk*.dat; do strings -n20 -w $f | awk '!seen[$0]++' > ${f%.dat}.txt; done
tail +n1 *.txt
Remove most of the binary crap:
head -n-1 *.txt | grep -e '[. ]' | grep -iv 'mined by' | less
By "Satoshi uploader" we mean the data upload script present in tx 4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17 of the Bitcoin blockchain.
The uploader, and its accompanying downloader, are Python programs stored in the blockchain itself. They are made to upload and download arbitrary data into the blockchain via RPC.
These scripts were notably used for: illegal content of block 229k. The script did not maintain its popularity much after this initial surge up loads, likely all done by the same user: there are very very few uploads done after block 229k with the Satoshi uploader.
Our choice of name as "Satoshi uploader" is copied from A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin by Matzutt et al. (2018) because the scripts are Copyrighted Satoshi Nakamoto on the header comment, although as mentioned at Hidden surprises in the Bitcoin blockchain by Ken Shirriff (2014) this feels very unlikely to be true.
A more convenient version of those scripts that can download directly from blockchain.info without the need for a full local node can be found at: github.com/cirosantilli/bitcoin-inscription-indexer/blob/master/download_tx_consts.py by using the --satoshi option. E.g. with it you can download the uploader script with:
./download_tx_consts.py --satoshi 4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17
mv 4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17.bin uploader.py
The scripts can be found in the blockchain at:
The uploader script uses its own cumbersome data encoding format, which we call the "Satoshi uploader format". The is as follows:
  • ignore all script operands and constants less than 20 bytes (40 hex characters). And there are a lot of small operands, e.g. the uploader itself uses format www.blockchain.com/btc/tx/4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17 has a OP_1, data, OP_3, OP_CHECKMULTISIG pattern on every output script, so the OP_1 and OP_3 are ignored. I.e., it is P2FMS.
  • ignore the last output, which contains a real change transaction instead of arbitrary data. TODO why not just do what with the length instead?
  • the first 4 bytes are the payload length, the next 4 bytes a CRC-32 signature. The payload length is in particular useful because of possible granularity of transactions. But it is hard to understand why a CRC-32 is needed in the middle of the largest hash tree ever created by human kind!!! It does however have the adavantage that it allows us to more uniquely identify which transactions use the format or not.
This means that if we want to index certain file types encoded in this format, a good heuristic is to skip the first 9 bytes (4 size, 4 CRC, 1 OP_1) and look for file signatures.
Let's try out the downloader to download itself. First you have to be running a Bitcoin Core server locally. Then, supposing .bitcon/bitoin.conf containing:
rpcuser=asdf
rpcpassword=qwer
server=1
txindex=1
we run:
git clone git://github.com/jgarzik/python-bitcoinrpc.git
git -C python-bitcoinrpc checkout cdf43b41f982b4f811cd4ebfbc787ab2abf5c94a
wget https://gist.githubusercontent.com/shirriff/64f48fa09a61b56ffcf9/raw/ad1d2e041edc0fb7ef23402e64eeb92c045b5ef7/bitcoin-file-downloader.py
pip install python-bitcoinrpc==1.0
BTCRPCURL=http://asdf:qwer@127.0.0.1:8332 \
  PYTHONPATH="$(pwd)/python-bitcoinrpc:$PYTHONPATH" \
  python3 bitcoin-file-downloader.py \
  6c53cd987119ef797d5adccd76241247988a0a5ef783572a9972e7371c5fb0cc
worked! The source of the downloader script is visible! Note that we had to wait for the sync of the entire blockchain to be fully finished for some reason for that to work.
Other known uploads in Satoshi format except from the first few:
  • tx 89248ecadd51ada613cf8bdf46c174c57842e51de4f99f4bbd8b8b34d3cb7792 block 344068 see ASCII art
  • tx 1ff17021495e4afb27f2f55cc1ef487c48e33bd5a472a4a68c56a84fc38871ec contains the ASCII text e5a6f30ff7d43f96f61af05efaf96f869aa072b5a071f32a24b03702d1dcd2a6. This number however is not a known transaction ID in the blockchain, and has no Google hits.
tx 243dea31863e94dc2f293489db02452e9bde279df1ab7feb6e456a4af672156a contains another upload script. The help reads:
Publish text in the blockchain, suitably padded for easy recovery with strings
This is likely a system that uploads text to the blockchain.
One example can be seen on the marijuana plant.
Messages are uploaded one line per transaction, and thus may be cut up on the blk.txt, and possibly even out of order.
But because each line starts with j( you can generally piece things up regardless.
TODO identify. The first occurrence seems to be in tx e8c61e29c6b829e289f8d0fc95f9eb2eb00c89c85cfa3a9c700b15805451ae6a:
j(DOCPROOF@?pnvf=!;AG