users.ox.ac.uk/~corp0014/B6-lectures.html gives a syllabus:
Problem set dated 2015: users.ox.ac.uk/~corp0014/B6-materials/B6_Problems.pdf Marked by: A. Ardavan and T. Hesjedal. Some more stuff under: users.ox.ac.uk/~corp0014/B6-materials/
The book is the fully commercial The Oxford Solid State Basics.
numpy/fft.py by Ciro Santilli 37 Updated 2025-07-16
Output:
sin(t)
fft
real 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
imag 0 -10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10
rfft
real 0 0 0 0 0 0 0 0 0 0 0
imag 0 -10 0 0 0 0 0 0 0 0 0

sin(t) + sin(4t)
fft
real 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
imag 0 -10 0 0 -10 0 0 0 0 0 0 0 0 0 0 0 10 0 0 10
rfft
real 0 0 0 0 0 0 0 0 0 0 0
imag 0 -10 0 0 -10 0 0 0 0 0 0
With our understanding of the discrete Fourier transform we see clearly that:
Common Crawl by Ciro Santilli 37 Updated 2025-07-16
Amazing project, that basically makes a more searchable Wayback Machine.
A bit hard to use their data though, partly due to size, but also lack of free to use querrying mechanisms, and how obtuse Amazon S3 is to use.
Notably, aws-cli with an account is the only reliable way, everything else is way too broken, e.g. trying the to check the an index index.commoncrawl.org/CC-MAIN-2023-06/ very often 500s.
But still, their projct is amazing.
The only out-of-the-box search they seem to have is: urlsearch.commoncrawl.org/ for domains/URLs. It is good, but there could be so much more... notably IPs.
Also could should document the data shape a bit better.
To explore the data, after login:
aws s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2013-20/
Copy the toplevel directory only:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/ . --recursive --exclude "*/*"
Copy some wet/wat files:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz .
aws s3 sync s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz .
Directory structrure:
  • cc-index.paths.gz (1K)
  • cc-index-table.paths.gz (1K)
  • segment.paths.gz (1.7K) Sample lines:
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/
    crawl-data/CC-MAIN-2013-20/segments/1368696381630/
  • index.html (2.3K)
  • wat.paths.gz (98K) Sample lines:
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wat.gz
  • wet.paths.gz (98K) Sample lines:
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wet.gz
  • warc.paths.gz (99K)
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz
    crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.gz
  • segments: directgory with actual data
    • 1368696381249: one of many segments, any meaning of name?
      • CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz (142M, 334M unzipped)
        A tiny bit of metadata, and then plaintext content from the website, e.g. the second one:
        WARC/1.0
        WARC-Type: conversion
        WARC-Target-URI: http://004eeb5.netsolhost.com/stephensilver.htm
        WARC-Date: 2013-05-18T08:11:02Z
        WARC-Record-ID: <urn:uuid:773b31ba-ddc6-47a5-ae24-d08141b9944d>
        WARC-Refers-To: <urn:uuid:4b1bdbff-4926-4ced-86f6-072f5bb3837a>
        WARC-Block-Digest: sha1:LQFSCR2LIJQYMPTXRHWU7HAPQTVSYS3A
        Content-Type: text/plain
        Content-Length: 12046
        
        Stephen Silver is a journalist and editor who specializes in the areas of politics, pop culture, film and sports. He works as an editor with the North American Publishing Co. and as a film critic with The Trend, a local newspaper in the Philadelphia area.
        No IP unfortunately.
      • CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz (329M, 1.4G unzipped)
        A lot of JSON metadata and no contents as desired. Contains IP! Some entries however are humongous with a ton of useless data, that's what bloats these so much:
        WARC/1.0
        WARC-Type: metadata
        WARC-Target-URI: CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz
        WARC-Date: 2013-11-22T14:51:12Z
        WARC-Record-ID: <urn:uuid:ec54e493-8965-41be-b344-07596cc30b3a>
        WARC-Refers-To: <urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>
        Content-Type: application/json
        Content-Length: 1180
        
        {"Envelope":{"Format":"WARC","WARC-Header-Length":"274","Block-Digest":"sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR","Actual-Content-Length":"372","WARC-Header-Metadata":{"WARC-Type":"warcinfo","WARC-Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz","WARC-Date":"2013-11-22T14:51:12Z","Content-Length":"372","WARC-Record-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Type":"application/warc-fields"},"Payload-Metadata":{"Trailing-Slop-Length":"0","Actual-Content-Type":"application/warc-fields","Actual-Content-Length":"372","Headers-Corrupt":true,"WARC-Info-Metadata":{"robots":"classic","software":"Nutch 1.6 (CC)/CC WarcExport 1.0","description":"Wide crawl of the web with URLs provided by Blekko for Spring 2013","hostname":"ip-10-60-113-184.ec2.internal","format":"WARC File Format 1.0","isPartOf":"CC-MAIN-2013-20","operator":"CommonCrawl Admin","publisher":"CommonCrawl"}}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"453","Header-Length":"10","Inflated-CRC":"866052549","Inflated-Length":"650"},"Offset":"0","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}}
        
        WARC/1.0
        WARC-Type: metadata
        WARC-Target-URI: http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions
        WARC-Date: 2013-05-18T05:48:54Z
        WARC-Record-ID: <urn:uuid:d519658f-7a63-46c1-849b-4cd92332ddb8>
        WARC-Refers-To: <urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f>
        Content-Type: application/json
        Content-Length: 1501
        
        {"Envelope":{"Format":"WARC","WARC-Header-Length":"433","Block-Digest":"sha1:B2B6JDSGWCUQIIUGV54SXEE25RX4SANS","Actual-Content-Length":"302","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2013-05-18T05:48:54Z","WARC-Warcinfo-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Length":"302","WARC-Record-ID":"<urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f>","WARC-Target-URI":"http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions","WARC-IP-Address":"165.1.125.44","Content-Type":"application/http; msgtype=request"},"Payload-Metadata":{"Trailing-Slop-Length":"4","HTTP-Request-Metadata":{"Headers":{"Accept-Language":"en-us,en-gb,en;q=0.7,*;q=0.3","Host":"ap.org","Accept-Encoding":"x-gzip, gzip, deflate","User-Agent":"CCBot/2.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},"Headers-Length":"300","Entity-Length":"0","Entity-Trailing-Slop-Bytes":"0","Request-Message":{"Method":"GET","Version":"HTTP/1.0","Path":"/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions"},"Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ"},"Actual-Content-Type":"application/http; msgtype=request"}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"455","Header-Length":"10","Inflated-CRC":"453539965","Inflated-Length":"739"},"Offset":"453","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}}
        Let's beautify one of them to see it better:
        
        {
          "Envelope": {
            "Format": "WARC",
            "WARC-Header-Length": "274",
            "Block-Digest": "sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR",
            "Actual-Content-Length": "372",
            "WARC-Header-Metadata": {
              "WARC-Type": "warcinfo",
              "WARC-Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz",
              "WARC-Date": "2013-11-22T14:51:12Z",
              "Content-Length": "372",
              "WARC-Record-ID": "<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>",
              "Content-Type": "application/warc-fields"
            },
            "Payload-Metadata": {
              "Trailing-Slop-Length": "0",
              "Actual-Content-Type": "application/warc-fields",
              "Actual-Content-Length": "372",
              "Headers-Corrupt": true,
              "WARC-Info-Metadata": {
                "robots": "classic",
                "software": "Nutch 1.6 (CC)/CC WarcExport 1.0",
                "description": "Wide crawl of the web with URLs provided by Blekko for Spring 2013",
                "hostname": "ip-10-60-113-184.ec2.internal",
                "format": "WARC File Format 1.0",
                "isPartOf": "CC-MAIN-2013-20",
                "operator": "CommonCrawl Admin",
                "publisher": "CommonCrawl"
              }
            }
          },
          "Container": {
            "Compressed": true,
            "Gzip-Metadata": {
              "Footer-Length": "8",
              "Deflate-Length": "453",
              "Header-Length": "10",
              "Inflated-CRC": "866052549",
              "Inflated-Length": "650"
            },
            "Offset": "0",
            "Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"
          }
        }
        Fuck no IP addresses either. But other entries do have it, why not this one?
        The reason these can be huge is the HTML-Metadata section which contain all outlinks! gist.github.com/Smerity/e750f0ef0ab9aa366558#file-bbc-pretty-wat-L34
      • CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz ()
        Obtain:
        aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz .
Urban Dictionary by Ciro Santilli 37 Updated 2025-07-16
What you Google-into when trying to understand English slangs as of 2020.
ImageNet subset by Ciro Santilli 37 Updated 2025-07-16
Subset generators:
Unfortunately, since ImageNet is a closed standard no one can upload such pre-made subsets, forcing everybody to download the full dataset, in ImageNet1k, which is huge!
Debugging by Ciro Santilli 37 Updated 2025-07-16
Debugging sucks. But there's also nothing quite that "oh fuck, that's why it doesn't work" moment, which happens after you have examined and placed everything that is relevant to the problem into your brain. You just can't see it coming. It just happens. You just learn what you generally have to look at so it happens faster.

Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
We have two killer features:
  1. topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
    Articles of different users are sorted by upvote within each article page. This feature is a bit like:
    • a Wikipedia where each user can have their own version of each article
    • a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
    This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
    Figure 1.
    Screenshot of the "Derivative" topic page
    . View it live at: ourbigbook.com/go/topic/derivative
  2. local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
    This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
    Figure 5. . You can also edit articles on the Web editor without installing anything locally.
    Video 3.
    Edit locally and publish demo
    . Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
  3. https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png
  4. Infinitely deep tables of contents:
    Figure 6.
    Dynamic article tree with infinitely deep table of contents
    .
    Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact