Software

Big goals:

the pursuit of AGI
physics simulations, including scientific visualization software
formalization of mathematics

Just art:

useless mathy stuff
incredibly nifty little tools that are just so satisfying to use it is mind blowing:
- ncdu
- GNU parallel
media related stuff
- FFmpeg one liners!

CMake

Examples under cmake:

cmake/hello: just print a message in CMake itself and exit. No compilation.
cmake/hello_c: C hello world
cmake/option: set() and option() basic examples
cmake/multi_executable
cmake/multi_file
cmake/multi_file_recursive
cmake/shared_lib_external

Compiler toolchain

Compiler + other closely related crap like linker.

Linker (computing)

Some linker related answers by Ciro Santilli:

Find UTF-8 strings with Binutils `strings` (Binutils)

Not possible it seems:

Automatic programming

We use the term "automatic programming" to mean "generating code from natural language".

The ultimate high level of which is of course to program with:

computer, make money

which is basically the goal of artificial general intelligence, especially according to The Employment Test definition of AGI.

The term has not always had that sense:

automatic programming has always been a euphemism for programming in a higher-level language than was then available to the programmer

sums it up.

But in the current AI boom, this is the sense that matters, so that's what we will go with.

Bibliography:

www.reddit.com/r/LocalLLaMA/comments/1d25arj/a_coding_llm_that_actually_tries_to_compile_andor/

AI code generation framework that tries to run code

OpenAI's GPT-4-turbo can generate and run Python code if it detects that the prompt would be better answered by Python, e.g. maths

AlphaEvolve (2025)

Blog post: deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Whitepaper: storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf

Basically they require users to hand-code a metric and provide a program skeleton with some parts of the code marked to be replaced, and then the system focuses on modifying the code regions in question to optimize the metric.

All the novel results they announced were in constraint satisfaction problems or optimization problem. Their results are still awesome, but it's not very different from AlphaGo style things.

AI code generation benchmark

Bibliography:

www.reddit.com/r/LocalLLaMA/comments/1e4unuz/any_up_to_date_benchmarking_sites_for_coding_llms/

Can AI code

Appears to be a very small number of newly created problems?

HumanEval (2021)

The tests are present in a gzip inside the Git repo: github.com/openai/human-eval/blob/master/data/HumanEval.jsonl.gz These researchers.

To get a quick overview of the problems with jq:

jq -r '"==== \(.task_id) \(.entry_point)\n\(.prompt)"' <HumanEval.jsonl

The first two problems are:

==== HumanEval/0 has_close_elements
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """

==== HumanEval/1 separate_paren_groups
from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """

so we understand that it takes as input an empty function with a docstring and you have to fill the function body.

The paper also shows that there can be other defined functions besides the one you have to implement.

BigCodeBench (2024)

Their most interesting subset, the -hard one, appears to be present at: huggingface.co/datasets/bigcode/bigcodebench-hard in Parquet format. OMG why.

The tests make free usage of the Python standard library and other major external libraries, e.g. huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=0 uses FTPlib. Kind of cool.

They even test graph plotting? huggingface.co/datasets/bigcode/bigcodebench-hard/viewer/default/v0.1.0_hf?views%5B%5D=v010_hf&row=11 How does it evaluate?

SWE-bench (2024)

www.swebench.com/

By Princeton people.

This one aims to solve GitHub issues. It appears to contain 2,294 real-world GitHub issues and their corresponding pull requests

The dataset appears to be at: huggingface.co/datasets/princeton-nlp/SWE-bench in Parquet format.

SWE-Lancer (2025)

Tasks from Upwork.

Lowering and raising

Lowering means translating to a lower level representation.

Raising means translating to a higher level representation.

Decompilation is basically a synonym, or subset, of raising.

Lower (compilation)

Decompiler

Video game reverse engineering

List of compilers

GNU Compiler Collection (gcc)

gcc CLI option

gcc `-save-temps`

Saves preprocessor output and generated assembly to separate files.

LLVM

LLVM Intermediate Representation (LLVM IR)

Very hot stuff! It's like ISA-portable assembly, but with types! In particular it also it deals with calling conventions for us (since it is ISA-portable). TODO: isn't that exactly what C does? :-) LLVM IR vs C

Documentation: llvm.org/docs/LangRef.html

Quantum Intermediate Representation

LLVM IR vs C

LLVM IR hello world

Example: llvm/hello.ll adapted from: llvm.org/docs/LangRef.html#module-structure but without double newline.

To execute it as mentioned at github.com/dfellis/llvm-hello-world we can either use their crazy assembly interpreter, tested on Ubuntu 22.10:

sudo apt install llvm-runtime
lli hello.ll

This seems to use puts from the C standard library.

Or we can Lower it to assembly of the local machine:

sudo apt install llvm
llc hello.ll

which produces:

hello.s

and then we can assemble link and run with gcc:

gcc -o hello.out hello.s -no-pie
./hello.out

or with clang:

clang -o hello.out hello.s -no-pie
./hello.out

hello.s uses the GNU GAS format, which clang is highly compatible with, so both should work in general.

clang

LLVM front-end for C and related language like C++ etc.

Reproducible builds

Reproducible builds allow anyone to verify that a binary large object contains what it claims to contain!

Bibliography:

Source-to-source compiler

Computer-aided design (CAD)

Open source CAD software

FreeCAD

Graphics software

en.wikipedia.org/wiki/List_of_information_graphics_software

Mathematics illustration software (Software for drawing geometry diagrams)

Survey by Ciro Santilli: math.stackexchange.com/questions/1985/software-for-drawing-geometry-diagrams/3938216#3938216

Many plotting software can be used to create mathematics illustrations. They just tend to have more data-oriented rather than explanatory-oriented output.

Inkscape

Graphics library

three.js

OpenGL

Ciro Santilli has some good related articles listed under: the best articles by Ciro Articles.

WebGL

Freetype GL

github.com/rougier/freetype-gl

Good library to render text in OpenGL, see also: stackoverflow.com/questions/8847899/opengl-how-to-draw-text-using-only-opengl-methods/36065835#36065835

Khronos Group

The fact that they kept the standard open source makes them huge heroes, see also: closed standard.

Shame that many (most?) of their proposals just die out.

Khronos standard

opengl-tutorial.org

github.com/opengl-tutorials/ogl/

Good modern OpenGL tutorial in retained mode with shaders, see also: stackoverflow.com/questions/6733934/what-does-immediate-mode-mean-in-opengl/36166310#36166310

Vulkan

Direct3D

JavaScript graphics library

Paper.js

github.com/paperjs/paper.js

Pixi.js

github.com/pixijs/pixi.js

Two.js

github.com/jonobr1/two.js

Examples at: two-js/.

JavaScript library, works both on browser and headless with Node.js to SVG.

Feels good. Maybe not ultra featured, and could have more simple examples in docs, but still good.

Vs Paper.js github.com/jonobr1/two.js/issues/319

One of the main features of Two.js appears to be the fact that it can natively render to either SVG and canvas, rather than creating SVG through DOM hacks as done by other projects.

Computer program

One specific software project, typically with a single executable file format entry point.

Computer security

As mentioned at Section "Computer security researcher", Ciro Santilli really tends to like people from this area.

Also, the type of programming Ciro used to do, systems programming, is particularly useful to security researchers, e.g. Linux Kernel Module Cheat.

The reason he does not go into this is that Ciro would rather fight against the more eternal laws of physics rather than with some typo some dude at Apple did last week and which will be patched in a month.

Exploit (computer security)

Arbitrary code execution

Phising

Cross-site scripting (XSS)

Computer security conference

DEF CON (1993-)

Black Hat Briefings (1997-)

Computer security researcher

Ciro Santilli found out that he likes computer security researchers and vice versa.

It's a bit the same reason why he likes physicists: you can't bullshit with security.

You can't just talk nice and hope for people to belive you.

You can't not try to break things and just keep everyone happy in their false illusion of safety.

You can't do a half job.

If you do any of that, you will get your ass handed to you in a little gift bag.

All of this is closely linked to Ciro Santilli's self perceived creative personality and being naughty and creative are correlated.

Dan Kaminsky (1979-2021)

A superstar security researcher with some major exploits from in the 2000's.

Dan Kaminsky approves Linux Kernel Module Cheat

twitter.com/dakami/status/1344853681749934080

Oh yeah, that felt good. A few months before he died.

Len Sassaman

Cool data embedded in the Bitcoin blockchain / Len Sassaman tribute

Data erasure

Denial-of-service attack (DoS, DoS attack)

Multi-factor authentication (2FA)

2FA app

Google 2FA app token can be updated without checking the old 2FA

Ermm, as of February 2021, I was able to update my 2FA app token with the password alone, it did not ask for the old 2FA.

So what's the fucking point of 2FA then? An attacker with my password would be able to login by doing that!

Is it that Google trusts that particular action because I used the same phone/known IP or something like that?

Authy

OAuth

The fatal flaw of OAuth is that websites have to enable specific providers, they can't just automatically select the correct OAuth for a given email domain. This means that the vast majority of websites will only provide the most widely popular providers such as Google, and the like, which means people won't have decent privacy.

So you are just better off with password logins and a decent password manager.

A cross browser, cross platform, and server-encrypted password manager is a must after Snowden!!! E.g. Proton Pass. And governments should obviously provide one to its citizens, or else be spied upon by the USA obviously: Governments should provide basic Internet infrastructure.

Plausible deniability

Privacy

WhatsApp profile information is public by default

Security through obscurity

stackoverflow.com/questions/533965/why-is-security-through-obscurity-a-bad-idea

Do as I say, not as I do: Ciro Santilli's Stack Overflow suspension for vote fraud script 2019, meta.stackoverflow.com/questions/381577/is-it-ok-to-have-links-on-how-to-create-sock-puppets-and-gain-rep-fraudulently-i/381635#381635.

Video 1.

LockPickingLawyer SAINTCON keynote (2021)

Source. SAINTCON is "Utah's Premiere Security Conference".

youtu.be/IH0GXWQDk0Q?t=900 mentions that Alfred Charles Hobbs commented in 1853:
Rogues are very keen in their profession, and know already much more than we can teach them

Brain-computer interface

It allows the client to prepare a single request that gets all the data it wants to fill up a given webpage, rather than doing several separate requests.

So it only gets exactly what it needs, and in a single request.

Very sweet. This is the future of the web.

no formatting;

set prompt ">>> "
log_user 0
send "What is quantum field theory?\r"
expect -re "(.+)$prompt"
puts -nonewline [join [lrange [lmap line [split $expect_out(1,string) \n] {regsub {\r$} $line ""}] 1 end] "\n"]

Then stdout will contain only the output of the command and nothing else.

Bibliography:

unix.stackexchange.com/questions/239161/get-the-output-from-expect-script-in-a-variable/792645#792645
stackoverflow.com/questions/45210358/expect-output-only-stdout-of-the-command-and-nothing-else/79517903#79517903
stackoverflow.com/questions/57975853/how-to-read-the-send-command-output-in-expect-script title is wrong, OP wants exit status apparently not stdout

GNU parallel

The author Ole Tange answers every question about it on Stack Exchange. What a legend!

This program makes you respect GNU make a bit more. Good old make with -j can not only parallelize, but also take in account a dependency graph.

Some examples under:

man parallel_exampes

To get the input argument explicitly job number use the magic string {}, e.g.:

printf 'a\nb\nc\n' | parallel echo '{}'

sample output:

a
b
c

To get the job number use {#} as in:

printf 'a\nb\nc\n' | parallel echo '{} {#}'

sample output:

a 1
b 2
c 3
c 3

{%} contains which thread the job running in, e.g. if we limit it to 2 threads with -j2:

printf 'a\nb\nc\nd\n' | parallel -j2 echo '{} {#} {%}'

sample output:

The percent must be a reference to "split the inputs module the number of workers", and modulo uses the % symbol in many programming languages such as C.

To pass multiple CLI arguments per command you can use -X e.g.:

printf 'a\nb\nc\nd\n' | parallel -j2 -X echo '{} {#} {%}'

sample output:

a b 1 1
c d 2 2

htop