It was mind blowing when in 2022, after several years of selection, one of the 7 finalists was broken on a classical computer, not even in a quantum computer! news.ycombinator.com/item?id=30466063 | eprint.iacr.org/2022/214 Breaking Rainbow Takes a Weekend on a Laptop by Ward Beullens. Dude announced he had a break a few days before submission: twitter.com/WardBeullens/status/1492780462028300290 On Twitter. He's so young. Epic.
Edit: and then, after the third round, things were a bit unclear, so they made a fourth round with 4 choices out of the 7 from round 3, and in August 2022 one of the four was broken again on a classic CPU!!! OMG: arstechnica.com/information-technology/2022/08/sike-once-a-post-quantum-encryption-contender-is-koed-in-nist-smackdown/
OK, now we're talking, two liner and you get a window showing bounding box object detection from your webcam feed!The accuracy is crap for anything but people. But still. Well done. Tested on Ubuntu 22.10, P51.
python -m pip install -U yolov5==7.0.9
yolov5 detect --source 0
Amazing project, that basically makes a more searchable Wayback Machine.
A bit hard to use their data though, partly due to size, but also lack of free to use querrying mechanisms, and how obtuse Amazon S3 is to use.
Notably, aws-cli with an account is the only reliable way, everything else is way too broken, e.g. trying the to check the an index index.commoncrawl.org/CC-MAIN-2023-06/ very often 500s.
But still, their projct is amazing.
The only out-of-the-box search they seem to have is: urlsearch.commoncrawl.org/ for domains/URLs. It is good, but there could be so much more... notably IPs.
Sample sizes can be found at: commoncrawl.org/2023/04/mar-apr-2023-crawl-archive-now-available/
To explore the data, after login:
aws s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2013-20/
Copy the toplevel directory only:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/ . --recursive --exclude "*/*"
Copy some wet/wat files:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz .
aws s3 sync s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz .
Directory structrure:
- cc-index.paths.gz (1K)
- cc-index-table.paths.gz (1K)
- segment.paths.gz (1.7K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/ crawl-data/CC-MAIN-2013-20/segments/1368696381630/
- index.html (2.3K)
- wat.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wat.gz
- wet.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wet.gz
- warc.paths.gz (99K)
crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.gz
- segments: directgory with actual data
- 1368696381249: one of many segments, any meaning of name?
- CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz (142M, 334M unzipped)A tiny bit of metadata, and then plaintext content from the website, e.g. the second one:No IP unfortunately.
WARC/1.0 WARC-Type: conversion WARC-Target-URI: http://004eeb5.netsolhost.com/stephensilver.htm WARC-Date: 2013-05-18T08:11:02Z WARC-Record-ID: <urn:uuid:773b31ba-ddc6-47a5-ae24-d08141b9944d> WARC-Refers-To: <urn:uuid:4b1bdbff-4926-4ced-86f6-072f5bb3837a> WARC-Block-Digest: sha1:LQFSCR2LIJQYMPTXRHWU7HAPQTVSYS3A Content-Type: text/plain Content-Length: 12046 Stephen Silver is a journalist and editor who specializes in the areas of politics, pop culture, film and sports. He works as an editor with the North American Publishing Co. and as a film critic with The Trend, a local newspaper in the Philadelphia area.
- A lot of JSON metadata and no contents as desired. Contains IP! Some entries however are humongous with a ton of useless data, that's what bloats these so much:Let's beautify one of them to see it better:
WARC/1.0 WARC-Type: metadata WARC-Target-URI: CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz WARC-Date: 2013-11-22T14:51:12Z WARC-Record-ID: <urn:uuid:ec54e493-8965-41be-b344-07596cc30b3a> WARC-Refers-To: <urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1> Content-Type: application/json Content-Length: 1180 {"Envelope":{"Format":"WARC","WARC-Header-Length":"274","Block-Digest":"sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR","Actual-Content-Length":"372","WARC-Header-Metadata":{"WARC-Type":"warcinfo","WARC-Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz","WARC-Date":"2013-11-22T14:51:12Z","Content-Length":"372","WARC-Record-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Type":"application/warc-fields"},"Payload-Metadata":{"Trailing-Slop-Length":"0","Actual-Content-Type":"application/warc-fields","Actual-Content-Length":"372","Headers-Corrupt":true,"WARC-Info-Metadata":{"robots":"classic","software":"Nutch 1.6 (CC)/CC WarcExport 1.0","description":"Wide crawl of the web with URLs provided by Blekko for Spring 2013","hostname":"ip-10-60-113-184.ec2.internal","format":"WARC File Format 1.0","isPartOf":"CC-MAIN-2013-20","operator":"CommonCrawl Admin","publisher":"CommonCrawl"}}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"453","Header-Length":"10","Inflated-CRC":"866052549","Inflated-Length":"650"},"Offset":"0","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}} WARC/1.0 WARC-Type: metadata WARC-Target-URI: http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions WARC-Date: 2013-05-18T05:48:54Z WARC-Record-ID: <urn:uuid:d519658f-7a63-46c1-849b-4cd92332ddb8> WARC-Refers-To: <urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f> Content-Type: application/json Content-Length: 1501 {"Envelope":{"Format":"WARC","WARC-Header-Length":"433","Block-Digest":"sha1:B2B6JDSGWCUQIIUGV54SXEE25RX4SANS","Actual-Content-Length":"302","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2013-05-18T05:48:54Z","WARC-Warcinfo-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Length":"302","WARC-Record-ID":"<urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f>","WARC-Target-URI":"http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions","WARC-IP-Address":"165.1.125.44","Content-Type":"application/http; msgtype=request"},"Payload-Metadata":{"Trailing-Slop-Length":"4","HTTP-Request-Metadata":{"Headers":{"Accept-Language":"en-us,en-gb,en;q=0.7,*;q=0.3","Host":"ap.org","Accept-Encoding":"x-gzip, gzip, deflate","User-Agent":"CCBot/2.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},"Headers-Length":"300","Entity-Length":"0","Entity-Trailing-Slop-Bytes":"0","Request-Message":{"Method":"GET","Version":"HTTP/1.0","Path":"/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions"},"Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ"},"Actual-Content-Type":"application/http; msgtype=request"}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"455","Header-Length":"10","Inflated-CRC":"453539965","Inflated-Length":"739"},"Offset":"453","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}}
Fuck no IP addresses either. But other entries do have it, why not this one?{ "Envelope": { "Format": "WARC", "WARC-Header-Length": "274", "Block-Digest": "sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR", "Actual-Content-Length": "372", "WARC-Header-Metadata": { "WARC-Type": "warcinfo", "WARC-Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz", "WARC-Date": "2013-11-22T14:51:12Z", "Content-Length": "372", "WARC-Record-ID": "<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>", "Content-Type": "application/warc-fields" }, "Payload-Metadata": { "Trailing-Slop-Length": "0", "Actual-Content-Type": "application/warc-fields", "Actual-Content-Length": "372", "Headers-Corrupt": true, "WARC-Info-Metadata": { "robots": "classic", "software": "Nutch 1.6 (CC)/CC WarcExport 1.0", "description": "Wide crawl of the web with URLs provided by Blekko for Spring 2013", "hostname": "ip-10-60-113-184.ec2.internal", "format": "WARC File Format 1.0", "isPartOf": "CC-MAIN-2013-20", "operator": "CommonCrawl Admin", "publisher": "CommonCrawl" } } }, "Container": { "Compressed": true, "Gzip-Metadata": { "Footer-Length": "8", "Deflate-Length": "453", "Header-Length": "10", "Inflated-CRC": "866052549", "Inflated-Length": "650" }, "Offset": "0", "Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz" } }
The reason these can be huge is theHTML-Metadata
section which contain all outlinks! gist.github.com/Smerity/e750f0ef0ab9aa366558#file-bbc-pretty-wat-L34 CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz
()Obtain:aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz .
- 1368696381249: one of many segments, any meaning of name?
In plain English: the space has no visible holes. If you start walking less and less on each step, you always converge to something that also falls in the space.
One notable example where completeness matters: Lebesgue integral of is complete but Riemann isn't.
In order to create a test user with password instead of peer authentication, let's create test user:
createuser -P user0
createdb user0
-P
makes it prompt for the users password.Alternatively, to create the password non-interactively stackoverflow.com/questions/42419559/postgres-createuser-with-password-from-terminal:Can't find a way using the
psql -c "create role NewRole with login password 'secret'"
createuser
helper.We can then login with that password with:which asks for the password we've just set, because the
psql -U user0 -h localhost
-h
option turns off peer authentication, and turns off password authentication.The password can be given non-interactively as shown at stackoverflow.com/questions/6405127/how-do-i-specify-a-password-to-psql-non-interactively with the
PGPASSWORD
environment variable:PGPASSWORD=a psql -U user0 -h localhost
Now let's create a test database which
user0
can access with an existing superuser account:createdb user0db0
psql -c 'GRANT ALL PRIVILEGES ON DATABASE user0db0 TO user0'
We can check this permission with:which now contains:The permission letters are explained at:
psql -c '\l'
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
user0db0 | ciro | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | =Tc/ciro +
| | | | | ciro=CTc/ciro +
| | | | | user0=CTc/ciro
user0
can now do the usual table operations on that table:PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'SELECT * FROM table0;'
This technique has managed to determine protein 3D structures for proteins that people were not able to crystallize for X-ray crystallography.
It is said however that cryoEM is even fiddlier than X-ray crystallography, so it is mostly attempted if crystallization attempts fail.
We just put a gazillion copies of our molecule of interest in a solution, and then image all of them in the frozen water.
Each one of them appears in the image in a random rotated view, so given enough of those point of view images, we can deduce the entire 3D structure of the molecule.
Ciro Santilli once watched a talk by Richard Henderson about cryoEM circa 2020, where he mentioned that he witnessed some students in the 1980's going to Germany, and coming into contact with early cryoEM. And when they came back, they just told their principal investigator: "I'm going to drop my PhD theme and focus exclusively on cryoEM". That's how hot the cryo thing was! So cool.
There are infinitely many primes with a neighbor not further apart than 70 million. This was the first such finite bound to be proven, and therefore a major breakthrough.
This implies that for at least one value (or more) below 70 million there are infinitely many repetitions, but we don't know which e.g. we could have infinitely many:or infinitely many:or infinitely many:or infinitely many:but we don't know which of those.
The Prime k-tuple conjecture conjectures that it is all of them.
Public landing page: www.ox.ac.uk/admissions/undergraduate/courses/course-listing/computer-science
Course lists: www.cs.ox.ac.uk/teaching/courses/ True to form, courses appear to have identifiers, e.g. The "course materials" section of each course leads to courses.cs.ox.ac.uk/ which is paywalled by IP (accessible via Eduroam): TODO which system does it use? Some courses place their materials directly on "www.cs.ox.ac.uk", and when that is the case they are publicly accessible. So it is very much hit and miss. E.g. www.cs.ox.ac.uk/teaching/courses/2022-2023/quantum/index.html from Quantum Processes and Computation course of the University of Oxford has the assignments such as www.cs.ox.ac.uk/people/aleks.kissinger/courses/qpc2022/assignment1.pdf publicly visible, but e.g. www.cs.ox.ac.uk/teaching/courses/2022-2023/modelsofcomputation/ has nothing.
qi
for the Quantum Information course of the University of Oxford rather than more arbitrary A1/A2/A3, B1/B2/B3, naming convention used by the Mathematics course of the University of Oxford and the Physics course of the University of Oxford, and URLs can either have years or not:- www.cs.ox.ac.uk/teaching/courses/qi/: no year: goes to latest
- www.cs.ox.ac.uk/teaching/courses/2023-2024/qi/: has year, fixed year. Disgraceful repetition of redundant 2023-2024, but OK.
Handbook:
- 2022:
- general www.cs.ox.ac.uk/files/13731/CS%20Handbook%20final.pdf
- Year 1 (Prelims): www.cs.ox.ac.uk/files/13794/Handbook%202022%20Part%20C%20-%20V1.3.pdf
- Year 2/3 (Parts A/B): www.cs.ox.ac.uk/files/13793/Handbook%202022%20Parts%20A%20&%20B%20V1.3.pdf There is some mixture on which courses can be taken on year 2 or 3. This also implies that they cannot have the usual A2/B2 naming scheme. They just don't have names instead mostly. It is also the most beautiful illustration of why you shouldn't do Compute Science at university: there's no depth to the subject. You can just take random courses and you learn it all quickly. Section "The only reason for universities to exist should be the laboratories".
- Year 2 has four mandatory core courses:
- Models of Computation
- Algorithms and Data Structures
- Compilers (mandatory for compsi, but not mathematics and computer science)
- Concurrent programming
- A only:
- Hilary term
- Concurrent Programming (mandatory for compsi, but not mathematics and computer science)
- Quantum information
- Year 2 has four mandatory core courses:
- Year 4 (Part C): www.cs.ox.ac.uk/files/13794/Handbook%202022%20Part%20C%20-%20V1.3.pdf
- Michaelmas term
- Bayesian Statistical Probabilistic Programming
- Concurrent Algorithms and Data Structures
- Quantum Processes and Computation
- Computational Learning Theory
- Computational Biology
- Advanced Complexity Theory
- Graph Representation Learning
- Hilary term
- Advanced Security
- Database Systems Implementation
- Ethical Computing in Practice
- Law and Computer Science
- Quantum Software course of the University of Oxford
- Geometric Deep Learning
- Foundations of Self-Programming Agents
- Deep Learning in Healthcare
- Michaelmas term
This is a term invented by Ciro Santilli, and refers to a loose set of uncommon Bitcoin inscription methods that involve inscribing one or a small number of payloads per Bitcoin transaction.
These methods are both inefficient and hard to detect and decode, partly because Bitcoin Core does not index spending transactions: bitcoin.stackexchange.com/questions/61794/bitcoin-rpc-how-to-find-the-transaction-that-spends-a-txo. This makes finding them all that more rewarding however.
On the other hand, they do have the advantage of not depending on any block size limits, as their individual transactions are very small.
There are unlisted articles, also show them or only show them.