Spin like mad between:
- standards
- high level generators. We use the assembler
as
and linkerld
. - hexdumps
- file decompilers. We use
readelf
. It makes it faster to read the ELF file by turning it into human readable output. But you must have seen one byte-by-byte example first, and think howreadelf
output maps to the standard. - low-level generators: stand-alone libraries that let you control every field of the ELF files you generated. github.com/BR903/ELFkickers, github.com/sqall01/ZwoELF and many more on GitHub.
- consumer: the
exec
system call of the Linux kernel can parse ELF files to starts processes: github.com/torvalds/linux/blob/v4.11/fs/binfmt_elf.c, stackoverflow.com/questions/8352535/how-does-kernel-get-an-executable-binary-file-running-under-linux/31394861#31394861
It is non-trivial to determine what is the smallest legal ELF file, or the smaller one that will do something trivial in Linux.
Some impressive attempts:
hello_world.asm
section .data
hello_world db "Hello world!", 10
hello_world_len equ $ - hello_world
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, hello_world
mov rdx, hello_world_len
syscall
mov rax, 60
mov rdi, 0
syscall
Compiled with:
nasm -w+all -f elf64 -o 'hello_world.o' 'hello_world.asm'
ld -o 'hello_world.out' 'hello_world.o'
Running:gives:
hd hello_world.o
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |....@.....@.....|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 01 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................|
000000a0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000c0 07 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 |................|
000000d0 00 00 00 00 00 00 00 00 10 02 00 00 00 00 00 00 |................|
000000e0 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |'...............|
000000f0 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 0d 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 40 02 00 00 00 00 00 00 |........@.......|
00000120 32 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |2...............|
00000130 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 17 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................|
00000150 00 00 00 00 00 00 00 00 80 02 00 00 00 00 00 00 |................|
00000160 a8 00 00 00 00 00 00 00 05 00 00 00 06 00 00 00 |................|
00000170 04 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................|
00000180 1f 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 30 03 00 00 00 00 00 00 |........0.......|
000001a0 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |4...............|
000001b0 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001c0 27 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 |'...............|
000001d0 00 00 00 00 00 00 00 00 70 03 00 00 00 00 00 00 |........p.......|
000001e0 18 00 00 00 00 00 00 00 04 00 00 00 02 00 00 00 |................|
000001f0 04 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................|
00000200 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a 00 00 00 |Hello world!....|
00000210 b8 01 00 00 00 bf 01 00 00 00 48 be 00 00 00 00 |..........H.....|
00000220 00 00 00 00 ba 0d 00 00 00 0f 05 b8 3c 00 00 00 |............<...|
00000230 bf 00 00 00 00 0f 05 00 00 00 00 00 00 00 00 00 |................|
00000240 00 2e 64 61 74 61 00 2e 74 65 78 74 00 2e 73 68 |..data..text..sh|
00000250 73 74 72 74 61 62 00 2e 73 79 6d 74 61 62 00 2e |strtab..symtab..|
00000260 73 74 72 74 61 62 00 2e 72 65 6c 61 2e 74 65 78 |strtab..rela.tex|
00000270 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |t...............|
00000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000290 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff |................|
000002a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002b0 00 00 00 00 03 00 01 00 00 00 00 00 00 00 00 00 |................|
000002c0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................|
000002d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002e0 11 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 |................|
000002f0 00 00 00 00 00 00 00 00 1d 00 00 00 00 00 f1 ff |................|
00000300 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000310 2d 00 00 00 10 00 02 00 00 00 00 00 00 00 00 00 |-...............|
00000320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000330 00 68 65 6c 6c 6f 5f 77 6f 72 6c 64 2e 61 73 6d |.hello_world.asm|
00000340 00 68 65 6c 6c 6f 5f 77 6f 72 6c 64 00 68 65 6c |.hello_world.hel|
00000350 6c 6f 5f 77 6f 72 6c 64 5f 6c 65 6e 00 5f 73 74 |lo_world_len._st|
00000360 61 72 74 00 00 00 00 00 00 00 00 00 00 00 00 00 |art.............|
00000370 0c 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000390
Sections with
sh_type == SHT_STRTAB
are called string tables.They hold a null separated array of strings.
Such sections are used by other sections when string names are to be used. The using section says:
- which string table they are using
- what is the index on the target string table where the string starts
So for example, we could have a string table containing:
Data: \0 a b c \0 d e f \0
Index: 0 1 2 3 4 5 6 7 8
And if another section wants to use the string
d e f
, they have to point to index 5
of this section (letter d
).Notable string table sections:
.shstrtab
.strtab
By the rich founder of Mt. Gox and Ripple, Jed McCaleb.
Obelisk is the Artificial General Intelligence laboratory at Astera. We are focused on the following problems: How does an agent continuously adapt to a changing environment and incorporate new information? In a complicated stochastic environment with sparse rewards, how does an agent associate rewards with the correct set of actions that led to those rewards? How does higher level planning arise?
Section type:
sh_type == SHT_STRTAB
.Common name: "section header string table".
This section gets pointed to by the
e_shstrnd
field of the ELF header itself.String indexes of this section are are pointed to by the
sh_name
field of section headers, which denote strings.This section does not have outputs:
SHF_ALLOC
marked, so it will not appear on the executing program.readelf -x .shstrtab hello_world.o
Hex dump of section '.shstrtab':
0x00000000 002e6461 7461002e 74657874 002e7368 ..data..text..sh
0x00000010 73747274 6162002e 73796d74 6162002e strtab..symtab..
0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex
0x00000030 7400 t.
There are two such entries, one pointing to
.data
and the other to .text
(section indexes 1
and 2
).Num: Value Size Type Bind Vis Ndx Name
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
TODO what is their purpose?
hello_world_len
points to the special st_shndx == SHN_ABS == 0xF1FF
.0xF1FF
is chosen so as to not conflict with other sections.st_value == 0xD == 13
which is the value we have stored there on the assembly: the length of the string Hello World!
.This is small optimization that our assembler does for us and which has ELF support.
Holds strings for the symbol table.
This section has
sh_type == SHT_STRTAB
.It is pointed to by outputs:
sh_link == 5
of the .symtab
section.readelf -x .strtab hello_world.o
Hex dump of section '.strtab':
0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm
0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel
0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st
0x00000030 61727400 art.
This implies that it is an ELF level limitation that global variables cannot contain NUL characters.
Contains the path to the dynamic loader, i.e.
/lib64/ld-linux-x86-64.so.2
in Ubuntu 18.10. Explained at: stackoverflow.com/questions/8040631/checking-if-a-binary-compiled-with-static/55664341#55664341file
5.36 however does use it to display file type as explained at: stackoverflow.com/questions/34519521/why-does-gcc-create-a-shared-object-instead-of-an-executable-binary-according-to/55704865#55704865Only appears in the executable.
Contains information of how the executable should be put into the process virtual memory.
The executable is generated from object files by the linker. The main jobs that the linker does are:
- determine which sections of the object files will go into which segments of the executable.
- do relocation according to the
.rela.text
section. This depends on how the multiple sections are put into memory.
readelf -l hello_world.out
gives:Elf file type is EXEC (Executable file)
Entry point 0x4000b0
There are 2 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000d7 0x00000000000000d7 R E 200000
LOAD 0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
0x000000000000000d 0x000000000000000d RW 200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
On the ELF header, and:
e_phoff
, e_phnum
and e_phentsize
told us that there are 2 program headers, which start at 0x40
and are 0x38
bytes long each, so they are:00000040 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 |..@.......@.....|
00000060 d7 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 00 |................|
00000070 00 00 20 00 00 00 00 00 |.. ..... |
00000070 01 00 00 00 06 00 00 00 | ........|
00000080 d8 00 00 00 00 00 00 00 d8 00 60 00 00 00 00 00 |..........`.....|
00000090 d8 00 60 00 00 00 00 00 0d 00 00 00 00 00 00 00 |..`.............|
000000a0 0d 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 |.......... .....|
Structure represented www.sco.com/developers/gabi/2003-12-17/ch5.pheader.html:
typedef struct {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
Elf64_Xword p_filesz;
Elf64_Xword p_memsz;
Elf64_Xword p_align;
} Elf64_Phdr;
Breakdown of the first one:
- 40 0:
p_type
=01 00 00 00
=PT_LOAD
: this is a regular segment that will get loaded in memory. - 40 4:
p_flags
=05 00 00 00
= execute and read permissions. No write: we cannot modify the text segment. A classic way to do this in C is with string literals: stackoverflow.com/a/30662565/895245 This allows kernels to do certain optimizations, like sharing the segment amongst processes. This member gives the offset from the beginning of the file at which the first byte of the segment resides.
But it looks like offsets from the beginning of segments, not file?- 50 0:
p_vaddr
=00 00 40 00 00 00 00 00
: initial virtual memory address to load this segment to - 50 8:
p_paddr
=00 00 40 00 00 00 00 00
: unspecified effect. Intended for systems in which physical addressing matters. TODO example? - 60 0:
p_filesz
=d7 00 00 00 00 00 00 00
: size that the segment occupies in memory. If smaller thanp_memsz
, the OS fills it with zeroes to fit when loading the program. This is how BSS data is implemented to save space on executable files. i368 ABI says onPT_LOAD
:The bytes from the file are mapped to the beginning of the memory segment. If the segment’s memory size (p_memsz) is larger than the file size (p_filesz), the ‘‘extra’’ bytes are defined to hold the value 0 and to follow the segment’s initialized area. The file size may not be larger than the memory size.
The second segment (
.data
) is analogous. TODO: why use offset 0x0000d8
and address 0x00000000006000d8
? Why not just use 0
and 0x00000000006000d8
?Then the:section of the
Section to Segment mapping:
readelf
tells us that:TODO where does this information come from? stackoverflow.com/questions/23018496/section-to-segment-mapping-in-elf-files
Whenever Ciro Santilli learns about molecular biology, he can't help but to feel that it feels like programming, and notably systems programming and computer hardware design.
In some sense, the comparison is obvious: DNA is clearly a programmable medium like any assembly language, but still, systems programming did give Ciro some further feelings.
- The most important analogy perhaps is observability, or more precisely the lack of it. For the computer, this is described at: The lower level you go into a computer, the harder it is to observe things.And then, when Ciro started learning a bit about biology techniques, he started to feel the exact same thing.For example when he played with E. Coli Whole Cell Model by Covert Lab, the main thing Ciro felt was: it is going to be hard to verify any of this data, because it is hard/impossible to know the concentration of each element in a cell as a function of time.More generally of course, this is exactly why making any biology discovery is so hard: we can't easily see what's going on inside the cell, and have to resort to indirect ways of doing so..This exact idea was highlighted by I should have loved biology by James Somers:
For a computer scientist, a biologist's methods can seem insane; the trouble comes from the fact that cells are too small, too numerous, too complex to analyze the way a programmer would, say in a step-by-step debugger.
And then just like in software, some of the methods biologists use to overcome the lack of visibility have direct software analogues:- add instrumentation to cells, e.g. GFP tagging comes to mind
- emulation, e.g. E. Coli Whole Cell Model by Covert Lab
- The boot process is another one. E.g. in x86 the way that you start in 16-bit mode, largely compatible into the 70's, then move to 32-bit and finally 64, does feel a lot the way a earlier stages of embryo development looks more and more like more ancient animals.
Ciro likes to think that maybe that is why a hardcore systems programmer like Bert Hubert got into molecular biology.
Some other people who mention similar things:
- I should have loved biology by James Somers highlights the computer abstraction layer analogy between the two:
Docker is good.
As a lightweight virtualization however, it does break more often than full proper virtualization like QEMU after some updates.
Pinned article: Introduction to the OurBigBook Project
Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
Intro to OurBigBook
. Source. We have two killer features:
- topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculusArticles of different users are sorted by upvote within each article page. This feature is a bit like:
- a Wikipedia where each user can have their own version of each article
- a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.Figure 1. Screenshot of the "Derivative" topic page. View it live at: ourbigbook.com/go/topic/derivativeVideo 2. OurBigBook Web topics demo. Source. - local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
- to OurBigBook.com to get awesome multi-user features like topics and likes
- as HTML files to a static website, which you can host yourself for free on many external providers like GitHub Pages, and remain in full control
Figure 2. You can publish local OurBigBook lightweight markup files to either OurBigBook.com or as a static website.Figure 3. Visual Studio Code extension installation.Figure 5. . You can also edit articles on the Web editor without installing anything locally. Video 3. Edit locally and publish demo. Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension. - Infinitely deep tables of contents:
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact