This was Ciro Santilli's main study/work music for several years around 2020. Tabla rules.
Just like a classic programmer does not need to understand the intricacies of how transistors are implemented and CMOS semiconductors, the quantum programmer does not understand physical intricacies of the underlying physical implementation.
The main difference to keep in mind is that quantum computers cannot save and observe intermediate quantum state, so programming a quantum computer is basically like programming a combinatorial-like circuit with gates that operate on (qu)bits:
For this reason programming a quantum computer is much like programming a classical combinatorial circuit as you would do with SPICE, verilog-or-vhdl, in which you are basically describing a graph of gates that goes from the input to the output
For this reason, we can use the words "program" and "circuit" interchangeably to refer to a quantum program
Also remember that and there is no no clocks in combinatorial circuits because there are no registers to drive; and so there is no analogue of clock in the quantum system either,
Another consequence of this is that programming quantum computers does not look like programming the more "common" procedural programming languages such as C or Python, since those fundamentally rely on processor register / memory state all the time.
Quantum programmers can however use classic languages to help describe their quantum programs more easily, for example this is what happens in Qiskit, where you write a Python program that makes Qiskit library calls that describe the quantum program.
Quantum logic gates are needed for physical implementation Updated 2025-01-10 +Created 1970-01-01
One direct practical reason is that we need to map the matrix to real quantum hardware somehow, and all quantum hardware designs so far and likely in the future are gate-based: you manipulate a small number of qubits at a time (2) and add more and more of such operations.
While there are "quantum compilers" to increase the portability of quantum programs, it is to be expected that programs manually crafted for a specific hardware will be more efficient just like in classic computers.
TODO: is there any clear reason why computers can't beat humans in approximating any unitary matrix with a gate set?
This is analogous to what classic circuit programmers will do, by using smaller logic gates to create complex circuits, rather than directly creating one huge truth table.
The most commonly considered quantum gates take 1, 2, or 3 qubits as input.
The gates themselves are just unitary matrices that operate on the input qubits and produce the same number of output qubits.
For example, the matrix for the CNOT gate, which takes 2 qubits as input is:
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
The final question is then: if I have a 2 qubit gate but an input with more qubits, say 3 qubits, then what does the 2 qubit gate (4x4 matrix) do for the final big 3 qubit matrix (8x8)? In order words, how do we scale quantum gates up to match the total number of qubits?
The intuitive answer is simple: we "just" extend the small matrix with a larger identity matrix so that the sum of the probabilities third bit is unaffected.
More precisely, we likely have to extend the matrix in a way such that the partial measurement of the original small gate qubits leaves all other qubits unaffected.
For example, if the circuit were made up of a CNOT gate operating on the first and second qubits as in:
0 ----+----- 0
|
1 ---CNOT--- 1
2 ---------- 2
then we would just extend the 2x2 CNOT gate to:
TODO lazy to properly learn right now. Apparently you have to use the Kronecker product by the identity matrix. Also, zX-calculus appears to provide a powerful alternative method in some/all cases.
Multiple addresses translate to a single physical address Updated 2025-01-10 +Created 1970-01-01
The same linear address can translate to different physical addresses for different processes, depending only on the value inside
cr3
.Both linear addresses
00002 000
from process 1 and 00004 000
from process 2 point to the same physical address 00003 000
. This is completely allowed by the hardware, and it is up to the operating system to handle such cases.This often in normal operation because of Copy-on-write (COW), which be explained elsewhere.
Such mappings are sometime called "aliases".
Happening multiple times a day on Ubuntu 22.04 for Chromium, even though I turn computer on and off daily, unbearable:
- askubuntu.com/questions/1412575/pending-update-of-snap-store
- askubuntu.com/questions/1412140/pending-update-of-firefox-snap-close-the-app-to-avoid-disruptions
- forum.snapcraft.io/t/how-to-disable-snapd-update-notifications-permanently/31117 Settings > Notifications > Snapd User Session Agent
- www.reddit.com/r/Ubuntu/comments/v1s919/disable_pending_update_of_snap_message_kiosk/
- forum.snapcraft.io/t/refresh-app-awareness-call-for-testing/29123
gfx_v11_0_priv_reg_irq: register access in command stream Updated 2025-01-10 +Created 1970-01-01
Had this happen on P14s on Ubuntu 23.10 while causally using Chromium. The screen went blank for a few seconds, but it apparently managed to reboot itself, and things started working again, except that and most windows were killed:It appears to be a bug in the AMDGPU open source driver.
[drm:gfx_v11_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5774109, emitted seq=5774111
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chrome pid 14023 thread chrome:cs0 pid 14087
amdgpu 0000:64:00.0: amdgpu: GPU reset begin!
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:gfx_v11_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
Dec 27 15:03:38 ciro-p14s kernel: amdgpu 0000:64:00.0: amdgpu: MODE2 reset
Dec 27 15:03:38 ciro-p14s kernel: amdgpu 0000:64:00.0: amdgpu: GPU reset succeeded, trying to resume
Dec 27 15:03:38 ciro-p14s kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000900
Related reports:
I think this was on Wayland. Possibly relatd but on X Window System, crashed the UI, showed message "oh no! Something has gone wrong."
2024-01-13_21-55-07@ciro@ciro-p14s$ cat /var/log/apport.log
ERROR: apport (pid 975172) 2024-01-13 21:41:02,087: host pid 3528 crashed in a separate mount namespace, ignoring
INFO: apport (pid 975227) 2024-01-13 21:41:02,398: called for pid 2728, signal 5, core limit 0, dump mode 1
INFO: apport (pid 975227) 2024-01-13 21:41:02,401: executable: /usr/bin/gnome-shell (command line "/usr/bin/gnome-shell")
INFO: apport (pid 975227) 2024-01-13 21:41:12,667: wrote report /var/crash/_usr_bin_gnome-shell.1000.crash
FFFFF 000
points to its own physical address FFFFF 000
. This kind of translation is called an "identity mapping", and can be very convenient for OS-level debugging.stackoverflow.com/questions/17046204/how-to-find-the-boundaries-of-groups-of-contiguous-sequential-numbers/17046749#17046749 just works, even in SQLite which supports all quoting types known to man including
[]
for compatibility with insane RDBMSs!Here's a slightly saner version:
rm -f tmp.sqlite
sqlite3 tmp.sqlite "create table mytable (id integer primary key autoincrement, number integer, status integer)"
sqlite3 tmp.sqlite <<EOF
insert into mytable(number, status) values
(100,0),
(101,0),
(102,0),
(103,0),
(104,1),
(105,1),
(106,0),
(107,0),
(1014,0),
(1015,0),
(1016,1),
(1017,0)
EOF
sqlite3 tmp.sqlite <<EOF
SELECT
MIN(id) AS "id",
MIN(number) AS "from",
MAX(number) AS "to"
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY number) - number AS grp, id, number
FROM mytable
WHERE status = 0
)
GROUP BY grp
ORDER BY MIN(number)
EOF
output:
1|100|103
7|106|107
9|1014|1015
12|1017|1017
To get only groups of length greater than 1:
sqlite3 tmp.sqlite <<EOF
SELECT "id", "from", "to", "to" - "from" + 1 as "len" FROM (
SELECT
MIN("id") AS "id",
MIN(number) AS "from",
MAX(number) AS "to"
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY "number") - "number" AS "grp", "id", "number"
FROM "mytable"
WHERE "status" = 0
)
GROUP BY "grp"
ORDER BY MIN("number")
) WHERE "len" > 1
EOF
Output:
1|100|103|4
7|106|107|2
9|1014|1015|2
Some reverse engineering was done at: twitter.com/hackerfantastic/status/1575505438111571969?lang=en.
Notably, the password is hardcoded and its hash is stored in the JavaScript itself. The result is then submitted back via a POST request to
/cgi-bin/goal.cgi
.TODO: how is the SHA calculated? Appears to be manual.
x86's multi-level paging scheme uses a 2 level K-ary tree with 2^10 bits on each level.
Addresses are now split as:
| directory (10 bits) | table (10 bits) | offset (12 bits) |
Then:
- the top 10 bits are used to walk the top level of the K-ary tree (
level0
)The top table is called a "directory of page tables".cr3
now points to the location on RAM of the page directory of the current process instead of page tables.Page directory entries are very similar to page table entries except that they point to the physical addresses of page tables instead of physical addresses of pages.Each directory entry also takes up 4 bytes, just like page entries, so that makes 4 KiB per process minimum.Page directory entries also contain a valid flag: if invalid, the OS does not allocate a page table for that entry, and saves memory.Each process has one and only one page directory associated to it (and pointed to bycr3
), so it will contain at least2^10 = 1K
page directory entries, much better than the minimum 1M entries required on a single-level scheme. - the next 10 bits are used to walk the second level of the K-ary tree (
level1
)Second level entries are also called page tables like the single level scheme.Page tables are only allocated only as needed by the OS.Each page table has only2^10 = 1K
page table entries instead of2^20
for the single paging scheme.Each process can now have up to2^10
page tables instead of2^20
for the single paging scheme. - the offset is again not used for translation, it only gives the offset within a page
One reason for using 10 bits on the first two levels (and not, say,
12 | 8 | 12
) is that each Page Table entry is 4 bytes long. Then the 2^10 entries of Page directories and Page Tables will fit nicely into 4Kb pages. This means that it faster and simpler to allocate and deallocate pages for that purpose.This is how the memory could look like in a single level paging scheme:
Links Data Physical address
+-----------------------+ 2^32 - 1
| |
. .
| |
+-----------------------+ page0 + 4k
| data of page 0 |
+---->+-----------------------+ page0
| | |
| . .
| | |
| +-----------------------+ pageN + 4k
| | data of page N |
| +->+-----------------------+ pageN
| | | |
| | . .
| | | |
| | +-----------------------+ CR3 + 2^20 * 4
| +--| entry[2^20-1] = pageN |
| +-----------------------+ CR3 + 2^20 - 1 * 4
| | |
| . many entires .
| | |
| +-----------------------+ CR3 + 2 * 4
| +--| entry[1] = page1 |
| | +-----------------------+ CR3 + 1 * 4
+-----| entry[0] = page0 |
| +-----------------------+ <--- CR3
| | |
| . .
| | |
| +-----------------------+ page1 + 4k
| | data of page 1 |
+->+-----------------------+ page1
| |
. .
| |
+-----------------------+ 0
Notice that:
- the CR3 register points to the first entry of the page table
- the page table is just a large array with 2^20 page table entries
- each entry is 4 bytes big, so the array takes up 4 MiB
- each page table contains the physical address a page
- each page is a 4 KiB aligned 4KiB chunk of memory that user processes may use
- we have 2^20 table entries. Since each page is 4KiB == 2^12, this covers the whole 4GiB (2^32) of 32-bit memory
Dire times require dire methods: cia-2010-covert-communication-websites/cdx-tor.sh.
First we must start the tor servers with the and then use it on a newline separated domain name list to check;This creates a directory
tor-army
command from: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983tor-army 100
./cdx-tor.sh infile.txt
infile.txt.cdx/
containing:infile.txt.cdx/out00
,out01
, etc.: the suspected CDX lines from domains from each tor instance based on the simple criteria that the CDX can handle directly. We split the input domains into 100 piles, and give one selected pile per tor instance.infile.txt.cdx/out
: the final combined CDX output ofout00
,out01
, ...infile.txt.cdx/out.post
: the final output containing only domain names that match further CLI criteria that cannot be easily encoded on the CDX query. This is the cleanest domain name list you should look into at the end basically.
Since archive is so abysmal in its data access, e.g. a Google BigQuery would solve our issues in seconds, we have to come up with creative ways of getting around their IP throttling.
The CIA doesn't play fair. They're actually the exact opposite of fair. So neither shall we.
Distilled into an answer at: stackoverflow.com/questions/14321214/how-to-run-multiple-tor-processes-at-once-with-different-exit-ips/76749983#76749983
This should allow a full sweep of the 4.5M records in 2013 DNS Census virtual host cleanup in a reasonable amount of time. After JAR/SWF/CGI filtering we obtained 5.8k domains, so a reduction factor of about 1 million with likely very few losses. Not bad.
5.8k is still a bit annoying to fully go over however, so we can also try to count CDX hits to the domains and remove anything with too many hits, since the CIA websites basically have very few archives:This gives us something like:sorted by increasing hit counts, so we can go down as far as patience allows for!
cd 2013-dns-census-a-novirt-domains.txt.cdx
./cdx-tor.sh -d out.post domain-list.txt
cd out.post.cdx
cut -d' ' -f1 out | uniq -c | sort -k1 -n | awk 'match($2, /([^,]+),([^)]+)/, a) {printf("%s.%s %d\n", a[2], a[1], $1)}' > out.count
12654montana.com 1
aeronet-news.com 1
atohms.com 1
av3net.com 1
beechstreetas400.com 1
New results from a full CDX scan of 2013-dns-census-a-novirt.csv:
- 219.90.61.123 journeystravelled.com
The exact format of table entries is fixed _by the hardware_.
Each page entry can be seen as a
struct
with many fields.The page table is then an array of
struct
.On this simplified example, the page table entries contain only two fields:so in this example the hardware designers could have chosen the size of the page table to b
bits function
----- -----------------------------------------
20 physical address of the start of the page
1 present flag
21
instead of 32
as we've used so far.All real page table entries have other fields, notably fields to set pages to read-only for Copy-on-write. This will be explained elsewhere.
It would be impractical to align things at 21 bits since memory is addressable by bytes and not bits. Therefore, even in only 21 bits are needed in this case, hardware designers would probably choose 32 to make access faster, and just reserve bits the remaining bits for later usage. The actual value on x86 is 32 bits.
Here is a screenshot from the Intel manual image "Formats of CR3 and Paging-Structure Entries with 32-Bit Paging" showing the structure of a page table in all its glory: Figure 1. "x86 page entry format".
The fields are explained in the manual just after.
Why are pages 4KiB anyways?
There is a trade-off between memory wasted in:
- page tables
- extra padding memory within pages
This can be seen with the extreme cases:
- if the page size were 1 byte:
- granularity would be great, and the OS would never have to allocate unneeded padding memory
- but the page table would have 2^32 entries, and take up the entire memory!
- if the page size were 4GiB:
- we would need to swap 4GiB to disk every time a new process becomes active
- the page size would be a single entry, so it would take almost no memory at all
x86 designers have found that 4KiB pages are a good middle ground.
Besides a missing page, a very common source of page faults is copy-on-write (COW).
Page tables have extra flags that allow the OS to mark a page a read-only.
Those page faults only happen when a process tries to write to the page, and not read from it.
When Linux forks a process:
- instead of copying all the pages, which is unnecessarily costly, it makes the page tables of the two process point to the same physical address.
- it marks those linear addresses as read-only
- whenever one of the processes tries to write to a page, the makes a copy of the physical memory, and updates the pages of the two process to point to the two different physical addresses
Information about ARM paging can be found at: cirosantilli.com/linux-kernel-module-cheat#arm-paging
After a translation between linear and physical address happens, it is stored on the TLB. For example, a 4 entry TLB starts in the following state:
valid linear physical
----- ------ --------
> 0 00000 00000
0 00000 00000
0 00000 00000
0 00000 00000
The
>
indicates the current entry to be replaced.And after a page linear address and after a second translation of
00003
is translated to a physical address 00005
, the TLB becomes: valid linear physical
----- ------ --------
1 00003 00005
> 0 00000 00000
0 00000 00000
0 00000 00000
00007
to 00009
it becomes: valid linear physical
----- ------ --------
1 00003 00005
1 00007 00009
> 0 00000 00000
0 00000 00000
Now if
00003
needs to be translated again, hardware first looks up the TLB and finds out its address with a single RAM access 00003 --> 00005
.Of course,
00000
is not on the TLB since no valid entry contains 00000
as a key. Unlisted articles are being shown, click here to show only listed articles.