Expectation value Updated +Created
How the K-ary tree is used in x86 Updated +Created
x86's multi-level paging scheme uses a 2 level K-ary tree with 2^10 bits on each level.
Addresses are now split as:
| directory (10 bits) | table (10 bits) | offset (12 bits) |
Then:
  • the top 10 bits are used to walk the top level of the K-ary tree (level0)
    The top table is called a "directory of page tables".
    cr3 now points to the location on RAM of the page directory of the current process instead of page tables.
    Page directory entries are very similar to page table entries except that they point to the physical addresses of page tables instead of physical addresses of pages.
    Each directory entry also takes up 4 bytes, just like page entries, so that makes 4 KiB per process minimum.
    Page directory entries also contain a valid flag: if invalid, the OS does not allocate a page table for that entry, and saves memory.
    Each process has one and only one page directory associated to it (and pointed to by cr3), so it will contain at least 2^10 = 1K page directory entries, much better than the minimum 1M entries required on a single-level scheme.
  • the next 10 bits are used to walk the second level of the K-ary tree (level1)
    Second level entries are also called page tables like the single level scheme.
    Page tables are only allocated only as needed by the OS.
    Each page table has only 2^10 = 1K page table entries instead of 2^20 for the single paging scheme.
    Each process can now have up to 2^10 page tables instead of 2^20 for the single paging scheme.
  • the offset is again not used for translation, it only gives the offset within a page
One reason for using 10 bits on the first two levels (and not, say, 12 | 8 | 12 ) is that each Page Table entry is 4 bytes long. Then the 2^10 entries of Page directories and Page Tables will fit nicely into 4Kb pages. This means that it faster and simpler to allocate and deallocate pages for that purpose.
Single level paging scheme visualization Updated +Created
This is how the memory could look like in a single level paging scheme:
Links   Data                    Physical address

      +-----------------------+ 2^32 - 1
      |                       |
      .                       .
      |                       |
      +-----------------------+ page0 + 4k
      | data of page 0        |
+---->+-----------------------+ page0
|     |                       |
|     .                       .
|     |                       |
|     +-----------------------+ pageN + 4k
|     | data of page N        |
|  +->+-----------------------+ pageN
|  |  |                       |
|  |  .                       .
|  |  |                       |
|  |  +-----------------------+ CR3 + 2^20 * 4
|  +--| entry[2^20-1] = pageN |
|     +-----------------------+ CR3 + 2^20 - 1 * 4
|     |                       |
|     .    many entires       .
|     |                       |
|     +-----------------------+ CR3 + 2 * 4
|  +--| entry[1] = page1      |
|  |  +-----------------------+ CR3 + 1 * 4
+-----| entry[0] = page0      |
   |  +-----------------------+ <--- CR3
   |  |                       |
   |  .                       .
   |  |                       |
   |  +-----------------------+ page1 + 4k
   |  | data of page 1        |
   +->+-----------------------+ page1
      |                       |
      .                       .
      |                       |
      +-----------------------+  0
Notice that:
  • the CR3 register points to the first entry of the page table
  • the page table is just a large array with 2^20 page table entries
  • each entry is 4 bytes big, so the array takes up 4 MiB
  • each page table contains the physical address a page
  • each page is a 4 KiB aligned 4KiB chunk of memory that user processes may use
  • we have 2^20 table entries. Since each page is 4KiB == 2^12, this covers the whole 4GiB (2^32) of 32-bit memory
Wayback Machine CDX scanning with Tor parallelization Updated +Created
Dire times require dire methods: cia-2010-covert-communication-websites/cdx-tor.sh.
First we must start the tor servers with the tor-army command from: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983
tor-army 100
and then use it on a newline separated domain name list to check;
./cdx-tor.sh infile.txt
This creates a directory infile.txt.cdx/ containing:
  • infile.txt.cdx/out00, out01, etc.: the suspected CDX lines from domains from each tor instance based on the simple criteria that the CDX can handle directly. We split the input domains into 100 piles, and give one selected pile per tor instance.
  • infile.txt.cdx/out: the final combined CDX output of out00, out01, ...
  • infile.txt.cdx/out.post: the final output containing only domain names that match further CLI criteria that cannot be easily encoded on the CDX query. This is the cleanest domain name list you should look into at the end basically.
Since archive is so abysmal in its data access, e.g. a Google BigQuery would solve our issues in seconds, we have to come up with creative ways of getting around their IP throttling.
The CIA doesn't play fair. They're actually the exact opposite of fair. So neither shall we.
This should allow a full sweep of the 4.5M records in 2013 DNS Census virtual host cleanup in a reasonable amount of time. After JAR/SWF/CGI filtering we obtained 5.8k domains, so a reduction factor of about 1 million with likely very few losses. Not bad.
5.8k is still a bit annoying to fully go over however, so we can also try to count CDX hits to the domains and remove anything with too many hits, since the CIA websites basically have very few archives:
cd 2013-dns-census-a-novirt-domains.txt.cdx
./cdx-tor.sh -d out.post domain-list.txt
cd out.post.cdx
cut -d' ' -f1 out | uniq -c | sort -k1 -n | awk 'match($2, /([^,]+),([^)]+)/, a) {printf("%s.%s %d\n", a[2], a[1], $1)}' > out.count
This gives us something like:
12654montana.com 1
aeronet-news.com 1
atohms.com 1
av3net.com 1
beechstreetas400.com 1
sorted by increasing hit counts, so we can go down as far as patience allows for!
New results from a full CDX scan of 2013-dns-census-a-novirt.csv:
  • 219.90.61.123 journeystravelled.com
Page table entries Updated +Created
The exact format of table entries is fixed _by the hardware_.
Each page entry can be seen as a struct with many fields.
The page table is then an array of struct.
On this simplified example, the page table entries contain only two fields:
bits   function
-----  -----------------------------------------
20     physical address of the start of the page
1      present flag
so in this example the hardware designers could have chosen the size of the page table to b 21 instead of 32 as we've used so far.
All real page table entries have other fields, notably fields to set pages to read-only for Copy-on-write. This will be explained elsewhere.
It would be impractical to align things at 21 bits since memory is addressable by bytes and not bits. Therefore, even in only 21 bits are needed in this case, hardware designers would probably choose 32 to make access faster, and just reserve bits the remaining bits for later usage. The actual value on x86 is 32 bits.
Here is a screenshot from the Intel manual image "Formats of CR3 and Paging-Structure Entries with 32-Bit Paging" showing the structure of a page table in all its glory: Figure 1. "x86 page entry format".
Figure 1.
x86 page entry format
.
The fields are explained in the manual just after.
Page size choice Updated +Created
Why are pages 4KiB anyways?
There is a trade-off between memory wasted in:
  • page tables
  • extra padding memory within pages
This can be seen with the extreme cases:
  • if the page size were 1 byte:
    • granularity would be great, and the OS would never have to allocate unneeded padding memory
    • but the page table would have 2^32 entries, and take up the entire memory!
  • if the page size were 4GiB:
    • we would need to swap 4GiB to disk every time a new process becomes active
    • the page size would be a single entry, so it would take almost no memory at all
x86 designers have found that 4KiB pages are a good middle ground.
Cambridgeshire Updated +Created
Video 1.
Being a Dickhead's Cool by Reuben Dangoor (2010)
Source.
Copy-on-write Updated +Created
Besides a missing page, a very common source of page faults is copy-on-write (COW).
Page tables have extra flags that allow the OS to mark a page a read-only.
Those page faults only happen when a process tries to write to the page, and not read from it.
When Linux forks a process:
  • instead of copying all the pages, which is unnecessarily costly, it makes the page tables of the two process point to the same physical address.
  • it marks those linear addresses as read-only
  • whenever one of the processes tries to write to a page, the makes a copy of the physical memory, and updates the pages of the two process to point to the two different physical addresses
ARM Updated +Created
Information about ARM paging can be found at: cirosantilli.com/linux-kernel-module-cheat#arm-paging
Basic TLB operation Updated +Created
After a translation between linear and physical address happens, it is stored on the TLB. For example, a 4 entry TLB starts in the following state:
  valid  linear  physical
  -----  ------  --------
> 0      00000   00000
  0      00000   00000
  0      00000   00000
  0      00000   00000
The > indicates the current entry to be replaced.
And after a page linear address 00003 is translated to a physical address 00005, the TLB becomes:
  valid  linear  physical
  -----  ------  --------
  1      00003   00005
> 0      00000   00000
  0      00000   00000
  0      00000   00000
and after a second translation of 00007 to 00009 it becomes:
  valid  linear  physical
  -----  ------  --------
  1      00003   00005
  1      00007   00009
> 0      00000   00000
  0      00000   00000
Now if 00003 needs to be translated again, hardware first looks up the TLB and finds out its address with a single RAM access 00003 --> 00005.
Of course, 00000 is not on the TLB since no valid entry contains 00000 as a key.
Jena SPARQL hello world Updated +Created
They have a tutorial at: jena.apache.org/tutorials/sparql.html
Once you've done the Apache Jena CLI tools setup we can query all users with Full Name (FN) "John Smith" directly fom the rdf/vcard.ttl Turtle RDF file with the rdf/vcard.rq SPARQL query:
sparql --data=rdf/vcard.ttl --query=rdf/vcard.rq
and that outputs:
---------------------------------
| x                             |
=================================
| <http://somewhere/JohnSmith/> |
---------------------------------
Brewster's angle Updated +Created
Hund's first rule Updated +Created
Higher spin multiplicity means lower energy. I.e.: you want to keep all spins pointin in the same direction.
Enantiomer Updated +Created
Mirror images.
Key exmaple: d and L amino acids. Enantiomers have identical physico-chemical properties. But their biological roles can be very different, because an enzyme might only be able to act on one of them.
Polymorphism (materials science) Updated +Created
TODO definition. Appears to be isomers
Example:
AWS Deep Learning Base GPU AMI (Ubuntu 20.04) Updated +Created
These come with pre-installed drivers, so e.g. nvidia-smi just works on them out of the box, tested on g5.xlarge which has an Nvidia A10G GPU. Good choice as a starting point for deep learning experiments.
Condensed matter physics course of the University of Oxford Updated +Created
This could refer to several more specific courses, see the tagged articles for a list.
Neil Fernandez Updated +Created
“Especially my father. He was doing most of it and he is a savoury, strong character. He has strong beliefs about the world and in himself, and he was helping me a lot, even when I was at university as an undergraduate.”
An only child, Arran was born in 1995 in Glasgow, where his parents were studying at the time. His father has Spanish lineage, having a great grandfather who was a sailor who moved from Spain to St Vincent in the Carribean. A son later left the islands for the UK where he married an English woman. Arran’s mother is Norwegian.
“My father was writing and my mother is an economist. They both worked from home which also made things easier,” Arran says.
A bit like what Ciro Santilli feels about himself!
One of the articles says his father has a PhD. TODO where did he work? What's his PhD on? Photo: www.topfoto.co.uk/asset/1357880/
www.thetimes.co.uk/article/the-everyday-genius-pxsq5c50kt9:
Neil, a political economist, attended state and private schools in Hampshire but was also taught for a period at home by his mother.
It’s strange because for most people maths is a real turn-off, yet maths is all about patterns and children of two or three love patterns. It just shows that schools are doing something seriously wrong.”

There are unlisted articles, also show them or only show them.