These are the best articles ever authored by Ciro Santilli, most of them in the format of Stack Overflow answers.
Ciro posts update about new articles on his Twitter accounts.
A chronological list of all articles is also kept at: Section "Updates".
Some random generally less technical in-tree essays will be present at: Section "Essays by Ciro Santilli".
- Trended on Hacker News:
- CIA 2010 covert communication websites on 2023-06-11. 190 points, a mild success.
- x86 Bare Metal Examples on 2019-03-19. 513 points. The third time something related to that repo trends. Hacker news people really like that repo!
- again 2020-06-27 (archive). 200 points, repository traffic jumped from 25 daily unique visitors to 4.6k unique visitors on the day
- How to run a program without an operating system? on 2018-11-26 (archive). 394 points. Covers x86 and ARM
- ELF Hello World Tutorial on 2017-05-17 (archive). 334 points.
- x86 Paging Tutorial on 2017-03-02. Number 1 Google search result for "x86 Paging" in 2017-08. 142 points.
- x86 assembly
- What does "multicore" assembly language look like?
- What is the function of the push / pop instructions used on registers in x86 assembly? Going down to memory spills, register allocation and graph coloring.
- Linux kernel
- What do the flags in /proc/cpuinfo mean?
- How does kernel get an executable binary file running under linux?
- How to debug the Linux kernel with GDB and QEMU?
- Can the sys_execve() system call in the Linux kernel receive both absolute or relative paths?
- What is the difference between the kernel space and the user space?
- Is there any API for determining the physical address from virtual address in Linux?
- Why do people write the
#!/usr/bin/env
python shebang on the first line of a Python script? - How to solve "Kernel Panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)"?
- Single program Linux distro
- QEMU
- gcc and Binutils:
- How do linkers and address relocation works?
- What is incremental linking or partial linking?
- GOLD (
-fuse-ld=gold
) linker vs the traditional GNU ld and LLVM ldd - What is the -fPIE option for position-independent executables in GCC and ld? Concrete examples by running program through GDB twice, and an assembly hello world with absolute vs PC relative load.
- How many GCC optimization levels are there?
- Why does GCC create a shared object instead of an executable binary according to file?
- C/C++: almost all of those fall into "disassemble all the things" category. Ciro also does "standards dissection" and "a new version of the standard is out" answers, but those are boring:
- What does "static" mean in a C program?
- In C++ source, what is the effect of
extern "C"
? - Char array vs Char Pointer in C
- How to compile glibc from source and use it?
- When should
static_cast
,dynamic_cast
,const_cast
andreinterpret_cast
be used? - What exactly is
std::atomic
in C++?. This answer was originally more appropriately entitled "Let's disassemble some stuff", and got three downvotes, so Ciro changed it to a more professional title, and it started getting upvotes. People judge books by their covers. notmain.o 0000000000000000 0000000000000017 W MyTemplate<int>::f(int) main.o 0000000000000000 0000000000000017 W MyTemplate<int>::f(int)
- IEEE 754
- What is difference between quiet NaN and signaling NaN?
- In Java, what does NaN mean?
Without subnormals: +---+---+-------+---------------+-------------------------------+ exponent | ? | 0 | 1 | 2 | 3 | +---+---+-------+---------------+-------------------------------+ | | | | | | v v v v v v ----------------------------------------------------------------- floats * **** * * * * * * * * * * * * ----------------------------------------------------------------- ^ ^ ^ ^ ^ ^ | | | | | | 0 | 2^-126 2^-125 2^-124 2^-123 | 2^-127 With subnormals: +-------+-------+---------------+-------------------------------+ exponent | 0 | 1 | 2 | 3 | +-------+-------+---------------+-------------------------------+ | | | | | v v v v v ----------------------------------------------------------------- floats * * * * * * * * * * * * * * * * * ----------------------------------------------------------------- ^ ^ ^ ^ ^ ^ | | | | | | 0 | 2^-126 2^-125 2^-124 2^-123 | 2^-127
- Computer science
- Algorithms
- Is it necessary for NP problems to be decision problems?
- Polynomial time and exponential time. Answered focusing on the definition of "exponential time".
- What is the smallest Turing machine where it is unknown if it halts or not?. Answer focusing on "blank tape" initial condition only. Large parts of it are summarizing the Busy Beaver Challenge, but some additions were made.
- Algorithms
- Git
| 0 | 4 | 8 | C | |-------------|--------------|-------------|----------------| 0 | DIRC | Version | File count | ctime ...| 0 | ... | mtime | device | 2 | inode | mode | UID | GID | 2 | File size | Entry SHA-1 ...| 4 | ... | Flags | Index SHA-1 ...| 4 | ... |
tree {tree_sha} {parents} author {author_name} <{author_email}> {author_date_seconds} {author_date_timezone} committer {committer_name} <{committer_email}> {committer_date_seconds} {committer_date_timezone} {commit message}
- How do I clone a subdirectory only of a Git repository?
- Python
- Web technology
- OpenGL
- What are shaders in OpenGL?
- Why do we use 4x4 matrices to transform things in 3D?
- Image Processing with GLSL shaders? Compared the CPU and GPU for a simple blur algorithm.
- Node.js
- Ruby on Rails
- POSIX
- What is POSIX? Huge classified overview of the most important things that POSIX specifies.
- Systems programming
- What do the terms "CPU bound" and "I/O bound" mean?
+--------+ +------------+ +------+ | device |>---------------->| function 0 |>----->| BAR0 | | | | | +------+ | |>------------+ | | | | | | | +------+ ... ... | | |>----->| BAR1 | | | | | | +------+ | |>--------+ | | | +--------+ | | ... ... ... | | | | | | | | +------+ | | | |>----->| BAR5 | | | +------------+ +------+ | | | | | | +------------+ +------+ | +--->| function 1 |>----->| BAR0 | | | | +------+ | | | | | | +------+ | | |>----->| BAR1 | | | | +------+ | | | | ... ... ... | | | | | | +------+ | | |>----->| BAR5 | | +------------+ +------+ | | | ... | | | +------------+ +------+ +------->| function 7 |>----->| BAR0 | | | +------+ | | | | +------+ | |>----->| BAR1 | | | +------+ | | ... ... ... | | | | +------+ | |>----->| BAR5 | +------------+ +------+
- Electronics
- Computer security
- Media
- How to resize a picture using ffmpeg's sws_scale()?
- Is there any decent speech recognition software for Linux? ran a few examples manually on
vosk-api
and compared to ground truth.
- Eclipse
- Computer hardware
- Scientific visualization software
- Numerical analysis
- Computational physics
- Register transfer level languages like Verilog and VHDL
- Android
- Debugging
- Program optimization
- Data
- Mathematics
- Section "Formalization of mathematics": some early thoughts that could be expanded. Ciro almost had a stroke when he understood this stuff in his teens.
- Network programming
- Physics
- Biology
- Quantum computing
- Bitcoin
- GIMP
- Home DIY
- China
Docker is good.
As a lightweight virtualization however, it does break more often than full proper virtualization like QEMU after some updates.
The images also appear to randomly update slightly and break things, even though you've specified e.g.:
FROM ubuntu:20.04
Also, we need more Linux distributions buildable from source, especially with Reproducible builds.
Creator of QEMU and FFmpeg, both of which Ciro Santilli deeply respects. And a bunch other random stuff.
What is shocking about Fabrice this is that both are insanely important software that Ciro Santilli really likes, and both seem to be completely unrelated subjects!
Google made billions on top of this dude:
- FFmpeg is the backend of YouTube
- QEMU is the default emulator for Android Studio as of 2019, which Android developers use by default under the hood to develop Android Apps on their desktop without the need for a real device.
At last but not least, Fabrice also studied in the same school that Ciro Santilli studied in France, École Polytechnique.
It is a shame that he keeps such a low profile, there are no videos of him on the web, and he declines interviews.
Another surprising fact is that Fabrice has not worked for the "Big Tech Companies" as far as can be publicly seen, but rather mostly on smaller companies that he co-founded: www.quora.com/Computer-Programmers/Computer-Programmers-Where-is-Fabrice-Bellard-employed
And he's also into some completely random projcts unsurprisingly:
- www.computerhistory.org/tdih/january/6/ Computer Scientist Fabrice Bellard Announces Computing Pi to Record Number of Digits
Bibliography:
- smartbear.com/de/blog/2011/fabrice-bellard-portrait-of-a-super-productive-pro/ contains a list of his projects as of 2011
This is the most important technical tutorial project that Ciro Santilli has done in his life so far as of 2019.
The scope is insane and unprecedented, and goes beyond Linux kernel-land alone, which is where it started.
It ended up eating every system programming content Ciro had previously written! Including:so that that repo would better be called "System Programming Cheat". But "Linux Kernel Module Cheat" sounds more hardcore ;-)
Other major things that could be added there as well in the future are:
- github.com/cirosantilli/algorithm-cheat
- computer architecture tutorials with gem5
Due to this project, some have considered Ciro to be (archive):which made Ciro smile, although "Linux kernel documenter God" would have been more precise.
some kind of Linux kernel god.
[ 1.451857] input: AT Translated Set 2 keyboard as /devices/platform/i8042/s1│loading @0xffffffffc0000000: ../kernel_modules-1.0//timer.ko
[ 1.454310] ledtrig-cpu: registered to indicate activity on CPUs │(gdb) b lkmc_timer_callback
[ 1.455621] usbcore: registered new interface driver usbhid │Breakpoint 1 at 0xffffffffc0000000: file /home/ciro/bak/git/linux-kernel-module
[ 1.455811] usbhid: USB HID core driver │-cheat/out/x86_64/buildroot/build/kernel_modules-1.0/./timer.c, line 28.
[ 1.462044] NET: Registered protocol family 10 │(gdb) c
[ 1.467911] Segment Routing with IPv6 │Continuing.
[ 1.468407] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver │
[ 1.470859] NET: Registered protocol family 17 │Breakpoint 1, lkmc_timer_callback (data=0xffffffffc0002000 <mytimer>)
[ 1.472017] 9pnet: Installing 9P2000 support │ at /linux-kernel-module-cheat//out/x86_64/buildroot/build/
[ 1.475461] sched_clock: Marking stable (1473574872, 0)->(1554017593, -80442)│kernel_modules-1.0/./timer.c:28
[ 1.479419] ALSA device list: │28 {
[ 1.479567] No soundcards found. │(gdb) c
[ 1.619187] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 │Continuing.
[ 1.622954] ata2.00: configured for MWDMA2 │
[ 1.644048] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ P5│Breakpoint 1, lkmc_timer_callback (data=0xffffffffc0002000 <mytimer>)
[ 1.741966] tsc: Refined TSC clocksource calibration: 2904.010 MHz │ at /linux-kernel-module-cheat//out/x86_64/buildroot/build/
[ 1.742796] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x29dc0f4s│kernel_modules-1.0/./timer.c:28
[ 1.743648] clocksource: Switched to clocksource tsc │28 {
[ 2.072945] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8043│(gdb) bt
[ 2.078641] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibis│#0 lkmc_timer_callback (data=0xffffffffc0002000 <mytimer>)
[ 2.080350] EXT4-fs (vda): mounting ext2 file system using the ext4 subsystem│ at /linux-kernel-module-cheat//out/x86_64/buildroot/build/
[ 2.088978] EXT4-fs (vda): mounted filesystem without journal. Opts: (null) │kernel_modules-1.0/./timer.c:28
[ 2.089872] VFS: Mounted root (ext2 filesystem) readonly on device 254:0. │#1 0xffffffff810ab494 in call_timer_fn (timer=0xffffffffc0002000 <mytimer>,
[ 2.097168] devtmpfs: mounted │ fn=0xffffffffc0000000 <lkmc_timer_callback>) at kernel/time/timer.c:1326
[ 2.126472] Freeing unused kernel memory: 1264K │#2 0xffffffff810ab71f in expire_timers (head=<optimized out>,
[ 2.126706] Write protecting the kernel read-only data: 16384k │ base=<optimized out>) at kernel/time/timer.c:1363
[ 2.129388] Freeing unused kernel memory: 2024K │#3 __run_timers (base=<optimized out>) at kernel/time/timer.c:1666
[ 2.139370] Freeing unused kernel memory: 1284K │#4 run_timer_softirq (h=<optimized out>) at kernel/time/timer.c:1692
[ 2.246231] EXT4-fs (vda): warning: mounting unchecked fs, running e2fsck isd│#5 0xffffffff81a000cc in __do_softirq () at kernel/softirq.c:285
[ 2.259574] EXT4-fs (vda): re-mounted. Opts: block_validity,barrier,user_xatr│#6 0xffffffff810577cc in invoke_softirq () at kernel/softirq.c:365
hello S98 │#7 irq_exit () at kernel/softirq.c:405
│#8 0xffffffff818021ba in exiting_irq () at ./arch/x86/include/asm/apic.h:541
Apr 15 23:59:23 login[49]: root login on 'console' │#9 smp_apic_timer_interrupt (regs=<optimized out>)
hello /root/.profile │ at arch/x86/kernel/apic/apic.c:1052
# insmod /timer.ko │#10 0xffffffff8180190f in apic_timer_interrupt ()
[ 6.791945] timer: loading out-of-tree module taints kernel. │ at arch/x86/entry/entry_64.S:857
# [ 7.821621] 4294894248 │#11 0xffffffff82003df8 in init_thread_union ()
[ 8.851385] 4294894504 │#12 0x0000000000000000 in ?? ()
│(gdb)
Real hardware is for newbs. Real hardware is for newbs.
Tested on Ubuntu 23.10 we approximately follow instructions from: docs.zephyrproject.org/3.4.0/develop/getting_started/index.html stopping before the "Flash the sample" section, as we don't flash QEMU. We just run it.
sudo apt install --no-install-recommends git cmake ninja-build gperf \
ccache dfu-util device-tree-compiler wget \
python3-dev python3-pip python3-setuptools python3-tk python3-wheel xz-utils file \
make gcc gcc-multilib g++-multilib libsdl2-dev libmagic1
python3 -m venv ~/zephyrproject/.venv
source ~/zephyrproject/.venv/bin/activate
pip install west
west init ~/zephyrproject
cd ~/zephyrproject
west update
west zephyr-export
cd ~
wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.16.1/zephyr-sdk-0.16.1_linux-x86_64.tar.xz
tar xvf zephyr-sdk-0.16.1_linux-x86_64.tar.xz
cd zephyr-sdk-0.16.1
./setup.sh
The installation procedure install all compiler toolchains for us, so we can then basically compile for any target. It also fetches the latest Git source code of Zephyr under:
~/zephyrproject/zephyr
The "most default" blinky hello world example which blinks an LED is a bit useless for us because QEMU doesn't have LEDs, so instead we are going to use one of the UART examples which will print characters we can see on QEMU stdout.
Let's start with the hello world example on an x86 target:and it outputs:The
cd ~/zephyrproject/zephyr
west build -b qemu_x86 samples/hello_world -t run
Hello World! qemu_x86
qemu_x64
on the output comes from the CONFIG_BOARD
macro github.com/zephyrproject-rtos/zephyr/blob/c15ff103001899ba0321b2c38013d1008584edc0/samples/hello_world/src/main.c#L11#include <zephyr/kernel.h>
int main(void)
{
printk("Hello World! %s\n", CONFIG_BOARD);
return 0;
}
The
qemu_x86
board is documented at: docs.zephyrproject.org/3.4.0/boards/x86/qemu_x86/doc/index.htmlYou can also first
cd
into the directory that you want to build in to avoid typing samples/hello_world
all the time:cd ~/zephyrproject/zephyr/samples/hello_world
zephyr west build -b qemu_x86 -t run
You can also build and run separately with:
west build -b qemu_x86
west build -t run
Another important option is:But note that it does not modify your
west build -t menuconfig
prj.conf
automatically for you.Let's try on another target:and same output, but on a completely different board! The
rm -rf build
zephyr west build -b qemu_cortex_a53 -t run
qemu_cortex_a53
board is documented at: docs.zephyrproject.org/3.4.0/boards/arm64/qemu_cortex_a53/doc/index.htmlThe list of all examples can be seen under:which for example contains:
ls ~/zephyrproject/zephyr/samples
zephyrproject/zephyr/samples/hello_world
So run another sample simply select it, e.g. to run
zephyrproject/zephyr/samples/synchronization
:west build -b qemu_cortex_a53 samples/synchronization -t run
OMG, both of those just fucking work on Ubuntu 20.04 with README instructions, it is unbelievable, those people don't have lives. And it builds the ROM byte by byte equal from source!
There are a few different versions:
- github.com/n64decomp/sm64 for emulator (i.e. or real hardware), tested at 9214dddabcce4723d9b6cda2ebccbac209f6447d
- github.com/sm64-port/sm64-port Ubuntu native, tested at 6b47859f757a40096fedd6237f2bc3573d0bc2a4Full screen with F10.
- github.com/sm64pc/sm64ex: fork of sm64-port, untested by Ciro Santilli, but more new amazing usability features, notably:
--skip-intro
: skips the annoying pipe intro and the need to wait for Lakitu to bring Peaches message!- in-game menu:
- cheats:
- hide HUD!
- no level selection yet, but a matter of time?
Also reported to work on ARM: www.reddit.com/r/linux/comments/ityg6w/pinephone_playing_super_mario_64_30fps/They also ported to browser with Emscripten: github.com/sm64pc/sm64ex/wiki/Compiling-for-the-web
Tested with the USA ROM at sha1sum 9bef1128717f958171a4afac3ed78ee2bb4e86ce (you need a ROM to extract assets, which the project automates), which is also documented in the project itself: github.com/sm64-port/sm64-port/blob/6b47859f757a40096fedd6237f2bc3573d0bc2a4/sm64.us.sha1. Disclaimer: Ciro Santilli owns a copy of Super Mario 64.
The only dependency missing from Ubuntu packages is the IRIX QEMU user mode which they need for their tooling. The project also has a QEMU fork for that, and provide a working deb.
From this project it was also noticed that certain ROM releases were not compiled with optimizations enabled, presumably because as a release title the compiler had optimization bugs! www.resetera.com/threads/so-apparently-the-ntsc-build-of-mario-64-didnt-use-any-compiler-optimizations.166277/ But now they do have a working compiler, and by turning that switch FPS increases in certain levels!!!
It is good to know that this game will "never die".
Some quick stupid patches:
- jump really high:
diff --git a/src/game/mario.c b/src/game/mario.c index 5b103fa..83c9f40 100644 --- a/src/game/mario.c +++ b/src/game/mario.c @@ -826,7 +826,7 @@ static u32 set_mario_action_airborne(struct MarioState *m, u32 action, u32 actio case ACT_JUMP: case ACT_HOLD_JUMP: m->marioObj->header.gfx.unk38.animID = -1; - set_mario_y_vel_based_on_fspeed(m, 42.0f, 0.25f); + set_mario_y_vel_based_on_fspeed(m, 200.0f, 0.25f); m->forwardVel *= 0.8f; break;
Interesting entry points:
src/game/game_init.c
TODO: enable the level select debug feature! tcrf.net/Super_Mario_64_(Nintendo_64)/Debug_Content#Classic_Debug_Display They actually shipped quite a few debug features into the retail game, and they have been reversed too. I tried this but it didn't work (or I don't know how to enable the level select menu):
diff --git a/src/game/main.c b/src/game/main.c
index 9e53e50..b7443a8 100644
--- a/src/game/main.c
+++ b/src/game/main.c
@@ -65,7 +65,7 @@ s8 sAudioEnabled = 1;
u32 sNumVblanks = 0;
s8 gResetTimer = 0;
s8 D_8032C648 = 0;
-s8 gDebugLevelSelect = 0;
+s8 gDebugLevelSelect = 1;
s8 D_8032C650 = 0;
s8 gShowProfiler = FALSE;
The
enhancements/
folder contains a few sample patches.Some tutorials of hacking it:
- www.youtube.com/watch?v=Jkb7Naczoww SM64 Decomp Tutorial 1: Setting Up and First Code Changes by Bitlytic (2021)
- www.youtube.com/watch?v=IuIpqX4neWg Rovert Decomp Tech Demo by Rovert (2019) Metal cap makes Mario huge.
- www.youtube.com/watch?v=5aG1Iyjo20w Is it Possible to Beat Super Mario 64 as Tiny Mario? (Mini Mario Challenge) coverts the obvious make Mario huge/tiny hack. Huge mario verion: www.youtube.com/watch?v=pR_gol6zlIo. There was a pre-decompilation ROM hack doing that trivial change already: Tiny Huge Mario 64. Sample tool-assisted speedrun: www.youtube.com/watch?v=C7BjzZ_Nkk0
The lower level you go into a computer, the harder it is to observe things Updated 2024-12-15 +Created 1970-01-01
This is a general principle of software/hardware design that Ciro feels holds wide applicability.
The most extreme case of this is of course the integrated circuit itself, in which it is essentially impossible (?) to observe the specific value of some indidual wire at some point.
Somewhat on the other extreme, we have high level programming languages running on top of an operating system: at this point, you can just GDB step debug your program, print the value of any variable/memory location, and fully understand anything that you want. Provided that you manage to easily reach that point of interest.
And for anything in between we have various intermediate levels of complication. The most notable perhaps being developing the operating system itself. At this level, you can't so easily step debug (although techniques do exist). For early boot or bootloaders for example, you might want to use JTAG for example on real hardware.
In parallel to this, there is also another very important pair of closely linked tradeoffs:
- the lower level at which something is implemented, the faster it runs
- emulation gives you observability back, at the cost of slower runtime
Emulation also has another potential downside: unless you are very careful at implementing things correctly, your model might not be representative of the real thing. Also, there may be important tradeoffs between how much the model looks like the real thing, and how fast it runs. For example, QEMU's use of binary translation allows it to run orders of magnitude faster than gem5. However, you are unable to make any predictions about system performance with QEMU, since you are not modelling key elements like the cache or CPU pipeline.
Instrumentation is another technique that has can be considered to achieve greater observability.
User mode emulation refers to the ability of certain emulators to emulate userland code running on top of a specific operating system, usually Linux.
For example, QEMU allows you to run a variety of userland ELF programs directly on it, without an underlying Linux kernel running.
User mode emulation is achieved by implementing System calls and special filesystems such as
/dev
manually on the emulator one by one.The general tradeoff is that simulation is less acurate as it may lack certain highly advanced kernel functionality you haven't implemented yet. But it is much easier to run executables with it, and you don't have to wait for boot to finish before running, you just run executables directly from the command line.