Never trust an experiment that is not supported by a good theory Updated 2025-04-24 +Created 1970-01-01
Not the usual bullshit you were expecting from the philosophy of Science, right?
Some notable quoters:
- Jacques Monod has the exact quote as presented here: pubmed.ncbi.nlm.nih.gov/22042272/, though presumably it was in French, TODO find the French version
- youtu.be/AYC5lE0b8os?t=41 A Computational Whole-Cell Model Predicts Genotype From Phenotype- Markus Covert by "Calit2ube" (2013), see also: Section "Whole cell simulation"
- the book Genius: Richard Feynman and Modern Physics by James Gleick (1994) mentions a few incidents of this involving Feynman, see e.g. chapter "New Particles, New Language" where he and fellow theorist Hans Bethe immediately spot problems with experimentalists' data in suspicious results
It was mind blowing when in 2022, after several years of selection, one of the 7 finalists was broken on a classical computer, not even in a quantum computer! news.ycombinator.com/item?id=30466063 | eprint.iacr.org/2022/214 Breaking Rainbow Takes a Weekend on a Laptop by Ward Beullens. Dude announced he had a break a few days before submission: twitter.com/WardBeullens/status/1492780462028300290 On Twitter. He's so young. Epic.
Edit: and then, after the third round, things were a bit unclear, so they made a fourth round with 4 choices out of the 7 from round 3, and in August 2022 one of the four was broken again on a classic CPU!!! OMG: arstechnica.com/information-technology/2022/08/sike-once-a-post-quantum-encryption-contender-is-koed-in-nist-smackdown/
Amazing project, that basically makes a more searchable Wayback Machine.
A bit hard to use their data though, partly due to size, but also lack of free to use querrying mechanisms, and how obtuse Amazon S3 is to use.
Notably, aws-cli with an account is the only reliable way, everything else is way too broken, e.g. trying the to check the an index index.commoncrawl.org/CC-MAIN-2023-06/ very often 500s.
But still, their projct is amazing.
The only out-of-the-box search they seem to have is: urlsearch.commoncrawl.org/ for domains/URLs. It is good, but there could be so much more... notably IPs.
Sample sizes can be found at: commoncrawl.org/2023/04/mar-apr-2023-crawl-archive-now-available/
To explore the data, after login:
aws s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2013-20/
Copy the toplevel directory only:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/ . --recursive --exclude "*/*"
Copy some wet/wat files:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz .
aws s3 sync s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz .
Directory structrure:
- cc-index.paths.gz (1K)
- cc-index-table.paths.gz (1K)
- segment.paths.gz (1.7K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/ crawl-data/CC-MAIN-2013-20/segments/1368696381630/
- index.html (2.3K)
- wat.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wat.gz
- wet.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wet.gz
- warc.paths.gz (99K)
crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.gz
- segments: directgory with actual data
- 1368696381249: one of many segments, any meaning of name?
- CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz (142M, 334M unzipped)A tiny bit of metadata, and then plaintext content from the website, e.g. the second one:No IP unfortunately.
WARC/1.0 WARC-Type: conversion WARC-Target-URI: http://004eeb5.netsolhost.com/stephensilver.htm WARC-Date: 2013-05-18T08:11:02Z WARC-Record-ID: <urn:uuid:773b31ba-ddc6-47a5-ae24-d08141b9944d> WARC-Refers-To: <urn:uuid:4b1bdbff-4926-4ced-86f6-072f5bb3837a> WARC-Block-Digest: sha1:LQFSCR2LIJQYMPTXRHWU7HAPQTVSYS3A Content-Type: text/plain Content-Length: 12046 Stephen Silver is a journalist and editor who specializes in the areas of politics, pop culture, film and sports. He works as an editor with the North American Publishing Co. and as a film critic with The Trend, a local newspaper in the Philadelphia area.
- A lot of JSON metadata and no contents as desired. Contains IP! Some entries however are humongous with a ton of useless data, that's what bloats these so much:Let's beautify one of them to see it better:
WARC/1.0 WARC-Type: metadata WARC-Target-URI: CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz WARC-Date: 2013-11-22T14:51:12Z WARC-Record-ID: <urn:uuid:ec54e493-8965-41be-b344-07596cc30b3a> WARC-Refers-To: <urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1> Content-Type: application/json Content-Length: 1180 {"Envelope":{"Format":"WARC","WARC-Header-Length":"274","Block-Digest":"sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR","Actual-Content-Length":"372","WARC-Header-Metadata":{"WARC-Type":"warcinfo","WARC-Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz","WARC-Date":"2013-11-22T14:51:12Z","Content-Length":"372","WARC-Record-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Type":"application/warc-fields"},"Payload-Metadata":{"Trailing-Slop-Length":"0","Actual-Content-Type":"application/warc-fields","Actual-Content-Length":"372","Headers-Corrupt":true,"WARC-Info-Metadata":{"robots":"classic","software":"Nutch 1.6 (CC)/CC WarcExport 1.0","description":"Wide crawl of the web with URLs provided by Blekko for Spring 2013","hostname":"ip-10-60-113-184.ec2.internal","format":"WARC File Format 1.0","isPartOf":"CC-MAIN-2013-20","operator":"CommonCrawl Admin","publisher":"CommonCrawl"}}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"453","Header-Length":"10","Inflated-CRC":"866052549","Inflated-Length":"650"},"Offset":"0","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}} WARC/1.0 WARC-Type: metadata WARC-Target-URI: http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions WARC-Date: 2013-05-18T05:48:54Z WARC-Record-ID: <urn:uuid:d519658f-7a63-46c1-849b-4cd92332ddb8> WARC-Refers-To: <urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f> Content-Type: application/json Content-Length: 1501 {"Envelope":{"Format":"WARC","WARC-Header-Length":"433","Block-Digest":"sha1:B2B6JDSGWCUQIIUGV54SXEE25RX4SANS","Actual-Content-Length":"302","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2013-05-18T05:48:54Z","WARC-Warcinfo-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Length":"302","WARC-Record-ID":"<urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f>","WARC-Target-URI":"http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions","WARC-IP-Address":"165.1.125.44","Content-Type":"application/http; msgtype=request"},"Payload-Metadata":{"Trailing-Slop-Length":"4","HTTP-Request-Metadata":{"Headers":{"Accept-Language":"en-us,en-gb,en;q=0.7,*;q=0.3","Host":"ap.org","Accept-Encoding":"x-gzip, gzip, deflate","User-Agent":"CCBot/2.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},"Headers-Length":"300","Entity-Length":"0","Entity-Trailing-Slop-Bytes":"0","Request-Message":{"Method":"GET","Version":"HTTP/1.0","Path":"/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions"},"Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ"},"Actual-Content-Type":"application/http; msgtype=request"}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"455","Header-Length":"10","Inflated-CRC":"453539965","Inflated-Length":"739"},"Offset":"453","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}}
Fuck no IP addresses either. But other entries do have it, why not this one?{ "Envelope": { "Format": "WARC", "WARC-Header-Length": "274", "Block-Digest": "sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR", "Actual-Content-Length": "372", "WARC-Header-Metadata": { "WARC-Type": "warcinfo", "WARC-Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz", "WARC-Date": "2013-11-22T14:51:12Z", "Content-Length": "372", "WARC-Record-ID": "<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>", "Content-Type": "application/warc-fields" }, "Payload-Metadata": { "Trailing-Slop-Length": "0", "Actual-Content-Type": "application/warc-fields", "Actual-Content-Length": "372", "Headers-Corrupt": true, "WARC-Info-Metadata": { "robots": "classic", "software": "Nutch 1.6 (CC)/CC WarcExport 1.0", "description": "Wide crawl of the web with URLs provided by Blekko for Spring 2013", "hostname": "ip-10-60-113-184.ec2.internal", "format": "WARC File Format 1.0", "isPartOf": "CC-MAIN-2013-20", "operator": "CommonCrawl Admin", "publisher": "CommonCrawl" } } }, "Container": { "Compressed": true, "Gzip-Metadata": { "Footer-Length": "8", "Deflate-Length": "453", "Header-Length": "10", "Inflated-CRC": "866052549", "Inflated-Length": "650" }, "Offset": "0", "Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz" } }
The reason these can be huge is theHTML-Metadata
section which contain all outlinks! gist.github.com/Smerity/e750f0ef0ab9aa366558#file-bbc-pretty-wat-L34 CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz
()Obtain:aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz .
- 1368696381249: one of many segments, any meaning of name?
In plain English: the space has no visible holes. If you start walking less and less on each step, you always converge to something that also falls in the space.
One notable example where completeness matters: Lebesgue integral of is complete but Riemann isn't.
Web of Stories contains amazing interviews with many (mostly American) winners.
See Surely You're Joking, Mr. Feynman chapter Alfred Nobel's Other Mistake's amazing comments about the Nobel Prize.
TODO who is the digital switch person he mentions?
- www.quora.com/unanswered/Who-was-Richard-Feynman-referring-to-in-the-book-Surely-Youre-Joking-Mr-Feynman-chapter-Alfred-Nobels-Other-Mistake-when-he-talks-about-A-friend-of-mine-whos-a-rich-man-he-invented-some-kind-of-simple-digital-switch on Quora
- github.com/cirosantilli/cirosantilli.github.io/issues/72
OK, now we're talking, two liner and you get a window showing bounding box object detection from your webcam feed!The accuracy is crap for anything but people. But still. Well done. Tested on Ubuntu 22.10, P51.
python -m pip install -U yolov5==7.0.9
yolov5 detect --source 0
There are infinitely many primes with a neighbor not further apart than 70 million. This was the first such finite bound to be proven, and therefore a major breakthrough.
This implies that for at least one value (or more) below 70 million there are infinitely many repetitions, but we don't know which e.g. we could have infinitely many:or infinitely many:or infinitely many:or infinitely many:but we don't know which of those.
The Prime k-tuple conjecture conjectures that it is all of them.
Public landing page: www.ox.ac.uk/admissions/undergraduate/courses/course-listing/computer-science
Course lists: www.cs.ox.ac.uk/teaching/courses/ True to form, courses appear to have identifiers, e.g. The "course materials" section of each course leads to courses.cs.ox.ac.uk/ which is paywalled by IP (accessible via Eduroam): TODO which system does it use? Some courses place their materials directly on "www.cs.ox.ac.uk", and when that is the case they are publicly accessible. So it is very much hit and miss. E.g. www.cs.ox.ac.uk/teaching/courses/2022-2023/quantum/index.html from Quantum Processes and Computation course of the University of Oxford has the assignments such as www.cs.ox.ac.uk/people/aleks.kissinger/courses/qpc2022/assignment1.pdf publicly visible, but e.g. www.cs.ox.ac.uk/teaching/courses/2022-2023/modelsofcomputation/ has nothing.
qi
for the Quantum Information course of the University of Oxford rather than more arbitrary A1/A2/A3, B1/B2/B3, naming convention used by the Mathematics course of the University of Oxford and the Physics course of the University of Oxford, and URLs can either have years or not:- www.cs.ox.ac.uk/teaching/courses/qi/: no year: goes to latest
- www.cs.ox.ac.uk/teaching/courses/2023-2024/qi/: has year, fixed year. Disgraceful repetition of redundant 2023-2024, but OK.
Handbook:
- 2022:
- general www.cs.ox.ac.uk/files/13731/CS%20Handbook%20final.pdf
- Year 1 (Prelims): www.cs.ox.ac.uk/files/13794/Handbook%202022%20Part%20C%20-%20V1.3.pdf
- Year 2/3 (Parts A/B): www.cs.ox.ac.uk/files/13793/Handbook%202022%20Parts%20A%20&%20B%20V1.3.pdf There is some mixture on which courses can be taken on year 2 or 3. This also implies that they cannot have the usual A2/B2 naming scheme. They just don't have names instead mostly. It is also the most beautiful illustration of why you shouldn't do Compute Science at university: there's no depth to the subject. You can just take random courses and you learn it all quickly. Section "The only reason for universities to exist should be the laboratories".
- Year 2 has four mandatory core courses:
- Models of Computation
- Algorithms and Data Structures
- Compilers (mandatory for compsi, but not mathematics and computer science)
- Concurrent programming
- A only:
- Hilary term
- Concurrent Programming (mandatory for compsi, but not mathematics and computer science)
- Quantum information
- Year 2 has four mandatory core courses:
- Year 4 (Part C): www.cs.ox.ac.uk/files/13794/Handbook%202022%20Part%20C%20-%20V1.3.pdf
- Michaelmas term
- Bayesian Statistical Probabilistic Programming
- Concurrent Algorithms and Data Structures
- Quantum Processes and Computation
- Computational Learning Theory
- Computational Biology
- Advanced Complexity Theory
- Graph Representation Learning
- Hilary term
- Advanced Security
- Database Systems Implementation
- Ethical Computing in Practice
- Law and Computer Science
- Quantum Software course of the University of Oxford
- Geometric Deep Learning
- Foundations of Self-Programming Agents
- Deep Learning in Healthcare
- Michaelmas term
Power, Sex, Suicide by Nick Lane (2006) part 5 "Murder or suicide" mentions that the key events that leads to apoptosis is when certain proteins normally present in the inner mitochondrial membrane spill out, and that this often happens when free radicals are produced in excess: the cell is really not doing well in those cases. This point suggests that the initial mitochondrial endosymbiosis happened due to a parasite that lived inside another cell. It mentions that even today we see parasites kill the host cell when they feel that the cell does not have many nutrients. This frees the parasites to then infect other cells.
There are unlisted articles, also show them or only show them.