If Ciro Santilli weren't a natural born activist, he chould have made an excellent intelligence analyst! See also: Section "Being naughty and creative are correlated".
- Stack Overflow Vote Fraud Script
- GitHub makes Ciro feel especially naughty:
- All GitHub Commit Emails: he extracted (almost) all Git commit emails from GitHub with Google BigQuery
- A repository with 1 million commits: likely the live repo with the most commits as of 2017
- An 100 year GitHub streak, likely longest ever when that existed. It was consuming too much server resources however, which led to GitHub admins manually turning off his contribution history.
- A repository with a 100k commit Git octopus merge. Now that is a true Cthulhu merge.
- 500 on adoc infinite header xref recursion: that was fun while it lasted
Outside this website:
Because when this gets converted to a OurBigBook.com page, it will be easier for people to copy paragraphs/fork and write a canonical page about Ciro.
What do you do when creating a pull request? Do you say "I", which is not true because Ciro did not say that, or do you say "John Doe thinks" bla bla?
And because his name is awesome! :-) Just kidding.
This became a micro-meme in 4chan:Correction: cirosantilli.com is not Ciro Santili's resume. It is your life.
- 2020-09-21 archive.vn/wip/Zz7fx (original) "ITT: weird sites you found by accident" a comment reads:
cirosantilli.com/ this is some guys resume who repeats his own name well over 1,000 times.
- 2020-04-30 archive.is/LgDbK (original) "Interesting Website thread" a comment reads:cirosantilli.com/ What is even this?"a guy who says his name over 500 times in his resume."
Ciro was trying to make his face fit on the banner. But it is hard because faces are square and text is long.
Then at one point, the CSS was a bit broken and the eye stuck out just left of Ciro Santilli.
At this moment, Ciro knew what to do.
This produced a "continuous image symbol to text" effect that felt so right.
The concept, like any other, is not in itself new and has been used by others, Ciro just independently rediscovered it again:
The Hundred Greatest Theorems by Paul and Jack Abad (1999) Updated 2025-01-10 +Created 1970-01-01
This is a well known though experiment, which Richard Feynman used to emphasize
- infinite wire with balanced positive and negative charges, so no net charge, but a net magnetic field
- a single charge moves parallel to wire at the same speed as the electrons
In the above experiment:
- from the wire frame, the charge feels electromagnetic force, because it is moving and there is a magnetic field
- from the single charge frame, there is still magnetic field (positive charges are moving), but the body itself is not moving, so there is no force!
The solution to this problem is length contraction: the positive charges are length contracted and the moving electrons aren't, and therefore they are denser and therefore there is an effective charge from that frame.
This is also mentioned at David Tong www.damtp.cam.ac.uk/user/tong/em/el4.pdf (archive) "David Tong: Lectures on Electromagnetism - 5. Electromagnetism and Relativity" "5.2.1 Magnetism and Relativity".
Design software for synthetic biological circuit.
The input is in Verilog! Overkill?
Then it essentially maps to a standard cell library of biological primitives!
Study of the metabolome.
Technique widely used to measure the size of DNA strands, most often PCR output of a region of interest.
A simple sample application is gel electrophoresis alelle determination.
Amazing project, that basically makes a more searchable Wayback Machine.
A bit hard to use their data though, partly due to size, but also lack of free to use querrying mechanisms, and how obtuse Amazon S3 is to use.
Notably, aws-cli with an account is the only reliable way, everything else is way too broken, e.g. trying the to check the an index index.commoncrawl.org/CC-MAIN-2023-06/ very often 500s.
But still, their projct is amazing.
The only out-of-the-box search they seem to have is: urlsearch.commoncrawl.org/ for domains/URLs. It is good, but there could be so much more... notably IPs.
Also could should document the data shape a bit better.
Sample sizes can be found at: commoncrawl.org/2023/04/mar-apr-2023-crawl-archive-now-available/
To explore the data, after login:
aws s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2013-20/
Copy the toplevel directory only:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/ . --recursive --exclude "*/*"
Copy some wet/wat files:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz .
aws s3 sync s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz .
Directory structrure:
- cc-index.paths.gz (1K)
- cc-index-table.paths.gz (1K)
- segment.paths.gz (1.7K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/ crawl-data/CC-MAIN-2013-20/segments/1368696381630/
- index.html (2.3K)
- wat.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wat.gz
- wet.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wet.gz
- warc.paths.gz (99K)
crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.gz
- segments: directgory with actual data
- 1368696381249: one of many segments, any meaning of name?
- CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz (142M, 334M unzipped)A tiny bit of metadata, and then plaintext content from the website, e.g. the second one:No IP unfortunately.
WARC/1.0 WARC-Type: conversion WARC-Target-URI: http://004eeb5.netsolhost.com/stephensilver.htm WARC-Date: 2013-05-18T08:11:02Z WARC-Record-ID: <urn:uuid:773b31ba-ddc6-47a5-ae24-d08141b9944d> WARC-Refers-To: <urn:uuid:4b1bdbff-4926-4ced-86f6-072f5bb3837a> WARC-Block-Digest: sha1:LQFSCR2LIJQYMPTXRHWU7HAPQTVSYS3A Content-Type: text/plain Content-Length: 12046 Stephen Silver is a journalist and editor who specializes in the areas of politics, pop culture, film and sports. He works as an editor with the North American Publishing Co. and as a film critic with The Trend, a local newspaper in the Philadelphia area.
- CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz (329M, 1.4G unzipped)A lot of JSON metadata and no contents as desired. Contains IP! Some entries however are humongous with a ton of useless data, that's what bloats these so much:Let's beautify one of them to see it better:
WARC/1.0 WARC-Type: metadata WARC-Target-URI: CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz WARC-Date: 2013-11-22T14:51:12Z WARC-Record-ID: <urn:uuid:ec54e493-8965-41be-b344-07596cc30b3a> WARC-Refers-To: <urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1> Content-Type: application/json Content-Length: 1180 {"Envelope":{"Format":"WARC","WARC-Header-Length":"274","Block-Digest":"sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR","Actual-Content-Length":"372","WARC-Header-Metadata":{"WARC-Type":"warcinfo","WARC-Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz","WARC-Date":"2013-11-22T14:51:12Z","Content-Length":"372","WARC-Record-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Type":"application/warc-fields"},"Payload-Metadata":{"Trailing-Slop-Length":"0","Actual-Content-Type":"application/warc-fields","Actual-Content-Length":"372","Headers-Corrupt":true,"WARC-Info-Metadata":{"robots":"classic","software":"Nutch 1.6 (CC)/CC WarcExport 1.0","description":"Wide crawl of the web with URLs provided by Blekko for Spring 2013","hostname":"ip-10-60-113-184.ec2.internal","format":"WARC File Format 1.0","isPartOf":"CC-MAIN-2013-20","operator":"CommonCrawl Admin","publisher":"CommonCrawl"}}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"453","Header-Length":"10","Inflated-CRC":"866052549","Inflated-Length":"650"},"Offset":"0","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}} WARC/1.0 WARC-Type: metadata WARC-Target-URI: http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions WARC-Date: 2013-05-18T05:48:54Z WARC-Record-ID: <urn:uuid:d519658f-7a63-46c1-849b-4cd92332ddb8> WARC-Refers-To: <urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f> Content-Type: application/json Content-Length: 1501 {"Envelope":{"Format":"WARC","WARC-Header-Length":"433","Block-Digest":"sha1:B2B6JDSGWCUQIIUGV54SXEE25RX4SANS","Actual-Content-Length":"302","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2013-05-18T05:48:54Z","WARC-Warcinfo-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Length":"302","WARC-Record-ID":"<urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f>","WARC-Target-URI":"http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions","WARC-IP-Address":"165.1.125.44","Content-Type":"application/http; msgtype=request"},"Payload-Metadata":{"Trailing-Slop-Length":"4","HTTP-Request-Metadata":{"Headers":{"Accept-Language":"en-us,en-gb,en;q=0.7,*;q=0.3","Host":"ap.org","Accept-Encoding":"x-gzip, gzip, deflate","User-Agent":"CCBot/2.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},"Headers-Length":"300","Entity-Length":"0","Entity-Trailing-Slop-Bytes":"0","Request-Message":{"Method":"GET","Version":"HTTP/1.0","Path":"/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions"},"Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ"},"Actual-Content-Type":"application/http; msgtype=request"}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"455","Header-Length":"10","Inflated-CRC":"453539965","Inflated-Length":"739"},"Offset":"453","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}}
Fuck no IP addresses either. But other entries do have it, why not this one?{ "Envelope": { "Format": "WARC", "WARC-Header-Length": "274", "Block-Digest": "sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR", "Actual-Content-Length": "372", "WARC-Header-Metadata": { "WARC-Type": "warcinfo", "WARC-Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz", "WARC-Date": "2013-11-22T14:51:12Z", "Content-Length": "372", "WARC-Record-ID": "<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>", "Content-Type": "application/warc-fields" }, "Payload-Metadata": { "Trailing-Slop-Length": "0", "Actual-Content-Type": "application/warc-fields", "Actual-Content-Length": "372", "Headers-Corrupt": true, "WARC-Info-Metadata": { "robots": "classic", "software": "Nutch 1.6 (CC)/CC WarcExport 1.0", "description": "Wide crawl of the web with URLs provided by Blekko for Spring 2013", "hostname": "ip-10-60-113-184.ec2.internal", "format": "WARC File Format 1.0", "isPartOf": "CC-MAIN-2013-20", "operator": "CommonCrawl Admin", "publisher": "CommonCrawl" } } }, "Container": { "Compressed": true, "Gzip-Metadata": { "Footer-Length": "8", "Deflate-Length": "453", "Header-Length": "10", "Inflated-CRC": "866052549", "Inflated-Length": "650" }, "Offset": "0", "Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz" } }
The reason these can be huge is theHTML-Metadata
section which contain all outlinks! gist.github.com/Smerity/e750f0ef0ab9aa366558#file-bbc-pretty-wat-L34 CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz
()Obtain:aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz .
- 1368696381249: one of many segments, any meaning of name?
OK, now we're talking, two liner and you get a window showing bounding box object detection from your webcam feed!The accuracy is crap for anything but people. But still. Well done. Tested on Ubuntu 22.10, P51.
python -m pip install -U yolov5==7.0.9
yolov5 detect --source 0
As a Brazilian, Ciro Santilli used to really love playing soccer (but not watching it), until he hurt his knee.
Playing soccer just feels amazing, because you are constantly running around, but with a more specific goal in mind: to get that ball into that goal!
Playing soccer was specially amazing in the flat wet sand beach of Santos. weekend, the sea, feet touching the sand, the sun going down, and your school mates next to you. Nirvana.
It is also true that under those conditions, the skin of your feet will get ripped off due to running on the slightly wet and flat sand no matter how thick it has become. But it is worth it.
Teams would often be slit between "the team with shirts vs the team without shirts", who would just take off their shirts. The two best players would take turns picking players into their teams, the first one to pick would be decided by odds and evens (par ou ímpar).
A pair of Havaianas, or Havaianas rip-offs, stuck into the sand, or even just some school bags, would do as a goal posts. More organized people, especially adults, would have their own water pipe goal with a proper net and all. But doing so would spoil the fun of endless discussions if a non flat ball had gone in or not into an imaginary rectangle.
That's how soccer was meant to be played.
Ciro became however disillusioned with soccer after his injury. It is a shame.
And so after that, Ciro decided to dedicate himself to sports where you can't hurt your knee.
Ciro hates water, so swimming is out of the question. What could be more boring than going back and forth on a fixed location a million times to gain some milliseconds?
And so Ciro has been left with the gym as the only main option for a while.
Running would have been a consideration, but Ciro Santilli's legs sometimes itch when he runs.
This is until he ended up living in a place with decent roads for cycling in the late 2010's, which led to Ciro Santilli's cycling.
First install NVM/NPM as shown at and then:
git clone https://github.com/cirosantilli/cirosantilli.github.io
cd cirosantilli.github.io
npm install
ourbigbook .
xdg-open index.html
How to develop Ciro Santilli's website before the OurBigBook migration Updated 2025-01-10 +Created 1970-01-01
The website moved from AsciiDoctor to OurBigBook Markup in 2020, making this section mostly useless. But hey, history!
Ciro's website is powered by GitHub Pages and Jekyll Asciidoc.
The source code is located at: github.com/cirosantilli/cirosantilli.github.io
Build locally, watch for changes and rebuild automatically, and start a local server with:
git clone --recursive https://github.com/cirosantilli/cirosantilli.github.io
cd cirosantilli.github.io
bundle install
npm install
./run
Source:
./run
.The website will be visible at: localhost:4000.
Tested on the latest Ubuntu.
Publish changes to GitHub Pages:
git add -u
git commit -m 'make yourself look sillier'
./publish
Source:
./publish
.GitHub forces us to use the master branch for the build output... so the actual source is in the branch
dev
.Update the gems with:
bundle update
git add Gemfile.lock
git commit -m 'update gems'
His website was originally written in markdown, however those were deprecated in favour of AsciiDoctor when Ciro saw the light, rationale shown at: markdown-style-guideuse-asciidoc
GitHub pages is chosen instead of a single page GitHub README.adoc for the following reasons:
- Ciro will want some unsupported extensions, notably mathematics, likely with KaTeX server side:
- github.com/asciidoctor/asciidoctor/pull/3338
- stackoverflow.com/questions/11256433/how-to-show-math-equations-in-general-githubs-markdownnot-githubs-blog
- g14n.info/2014/09/math-on-github-pages/
- stackoverflow.com/questions/11256433/how-to-show-math-equations-in-general-githubs-markdownnot-githubs-blog
- www.quora.com/How-can-I-combine-latex-and-markdown-in-GitHub
- when GitHub dies, Ciro's website URL still lives and retains the PageRank!
There are unlisted articles, also show them or only show them.