CPython JIT 2025-09-09
Added in CPython 3.13.
To enable tested on Ubuntu 25.04:
git clone https://github.com/python/cpython
cd cpython
git checkout v3.13.7
./configure --enable-experimental-jit
make -j
We can try to test it with python/inc_loop.py:
time ./python python/inc_loop.py 10000000
but the result is just as pathetic as without JIT currently, taking about 1 second for only 10m loops.
This can be compared with the optimal assembly from c/inc_loop_asm.c:
time ./inc_loop_asm.out 1000000000
which does 1 billion loops in about half a second on P14s.
For comparison, PyPy actually speeds things up and does 1 billion loops in about a second, so only 2x worse than native.
TODO triple check that JIT is enabled. Many threads say the command is:
./python -c 'import sysconfig; sysconfig.get_config_var("JIT_DEPS")'
but that fails with:
ModuleNotFoundError: No module named '_sysconfigdata__linux_x86_64-linux-gnu'
For comparison with a properly implemented dynamic language JIT running nodejs/inc_loop.js does 1 billion loops in 0.6s on v22.14.0, close to native.
tonybaloney.github.io/posts/python-gets-a-jit.html documents what the initial "JIT" implementation does. It is just an extremely naive concatenation of instructions that avoids a for + switch. No wonder it doesn't speed things up much at all.
c/inc_loop_asm.c Created 2025-06-17 Updated 2025-07-16
This is the only way that we've managed to reliably get a single inc instruction loop, by using inline assembly, e.g. on we do x86:
loop:
  inc %[i];
  cmp %[max], %[i];
  jb loop;
For 1s on P14s Ubuntu 25.04 GCC 14.2 -O0 x86_64 we need about 5 billion:
time ./inc_loop_asm.out 5000000000
c/inc_loop.c Created 2025-06-17 Updated 2025-07-16
Ubuntu 25.04 GCC 14.2 -O0 x86_64 produces a horrendous:
11c8:       48 83 45 f0 01          addq   $0x1,-0x10(%rbp)
11cd:       48 8b 45 f0             mov    -0x10(%rbp),%rax
11d1:       48 3b 45 e8             cmp    -0x18(%rbp),%rax
11d5:       72 f1                   jb     11c8 <main+0x7f>
To do about 1s on P14s we need 2.5 billion instructions:
time ./inc_loop.out 2500000000
and:
time ./inc_loop.out 2500000000
gives:
          1,052.22 msec task-clock                       #    0.998 CPUs utilized             
                23      context-switches                 #   21.858 /sec                      
                12      cpu-migrations                   #   11.404 /sec                      
                60      page-faults                      #   57.022 /sec                      
    10,015,198,766      instructions                     #    2.08  insn per cycle            
                                                  #    0.00  stalled cycles per insn   
     4,803,504,602      cycles                           #    4.565 GHz                       
        20,705,659      stalled-cycles-frontend          #    0.43% frontend cycles idle      
     2,503,079,267      branches                         #    2.379 G/sec                     
           396,228      branch-misses                    #    0.02% of all branches
With -O3 it manages to fully unroll the loop removing it entirely and producing:
    1078:       e8 d3 ff ff ff          call   1050 <strtoll@plt>
}
    107d:       5a                      pop    %rdx
    107e:       c3                      ret
to is it smart enough to just return the return value from strtoll directly as is in rax.
Any print() command ends up on the USB, and is shown on the computer via programs such as ampy get back.
However, you can also send data over actual UART.
We connect Pin 0 (TX), Pin 1 (RX) and Pin 2 (GND) to the DSD TECH, and the USB to the Ubuntu 25.04 host laptop.
Then on the host laptop I run:
screen /dev/ttyUSB0 9600
and a counter shows up there just fine!
llama-cli Created 2025-07-16 Updated 2025-09-09
A CLI front-end for llama.cpp.
A decent test command as of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 tested on Ubuntu 25.04:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake ..
make -j
cd bin
time ./llama-cli \
  --no-display-prompt \
  --single-turn \
  --temp 0 \
  -c 16384 \
  -cnv \
  -m ~/Downloads/Llama-3.1-Tulu-3-8B-Q8_0.gguf \
  -n 1000 \
  -ngl 100 \
  -p 'What is quantum field theory?' \
  -t 10 |
tee output.txt
and that was deterministic due to --temp 0.
Also, this command ran 2x faster at 18 tokens/s for 1000 tokens on P14s on GPU via Vulkan than on CPU which is achievable by removing the -ngl 100.
llama.cpp Created 2025-07-16 Updated 2025-07-16
This appears to be the backend library of Ollama.
They have a CLI front-end named llama-cli.
askubuntu.com/questions/1461564/install-llama-cpp-locally has some tutorials for Ubuntu. There was no nicely pre-packaged one for Ubuntu 25.04, but build worked on 79e0b68c178656bb0632cb8602d2940b755077f8 In particular it exposed Vulkan support before Ollama did: github.com/ollama/ollama/pull/5059 and it did seem to work, using up my AMD GPU.
picotool 2025-07-26
Tested on Ubuntu 25.04,
sudo apt install libusb-1.0-0-dev
git clone https://github.com/raspberrypi/pico-sdk
git clone https://github.com/raspberrypi/picotool
cd picotool
git checkout de8ae5ac334e1126993f72a5c67949712fd1e1a4
export PICO_SDK_PATH="$(pwd)/../pico-sdk"
mkdir build
cd build
cmake ..
cmake --build . -- -j"$(npro)" VERBOSE=1
and the executable is there under build/picotool so copy it somewhere in your PATH like:
cp picotool ~/bin
and then trying to use a Zephyr example:
sudo ~/bin/picotool load -f build/zephyr/zephyr.uf2
fails with:
No accessible RP2040 devices in BOOTSEL mode were found
TODO: how to avoid that? youtu.be/tRXLxrtfU_s?t=207 gives a workaround if you are using the Pico SDK by adding to CMakeLists.txt:
pico_enable_stdio_usb(blink 1)
but how to do it in Zephyr? Video description says:
make sure that your program initializes the USB code via a call to "stdio_init_all()".
but again how to do that from Zephyr? It appears that this only works if the code currently running has support for the feature:
Video 1.
Never unplug your Raspberry Pi Pico again by deltocode
. Source.
quickemu Created 2025-04-15 Updated 2025-07-16
This is a cool project that attempts to make it easy to emulate any of the three operating systems on QEMU.
Unofrtunately as of 2025 the project was falling a bit back on support, and the latest versions of the two closed source systems were buggy, tested as of quickemu 4.9.7 on Ubuntu 25.04:
Then ignore the other steps from the tutorial, as theese use the picozero package, which is broken with this error: github.com/raspberrypilearning/getting-started-with-the-pico/issues/57
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'
and uses picozero specific code. Rather, just use our examples from rpi-pico-w.
vscode freezes or crashes when opening a large folder Created 2025-05-26 Updated 2025-07-16
The issue appears to be that the file watcher goes out of control.
The reproduction is very simple:
mkdir mytest
cd mytest
seq 1000000 | xargs touch
code --disable-extensions .
and now the editor GUI hangs and Ubuntu shows a popup:
The window is not responding
htop reveals a bunch of processes or threads of type:
/snap/code/194/usr/share/code/code