Spin like mad between:
The ELF standard specifies multiple file formats:
  • Object files (.o).
    Intermediate step to generating executables and other formats:
    Source code
    
        |
        | Compilation
        |
        v
    
    Object file
    
        |
        | Linking
        |
        v
    
    Executable
    Object files exist to make compilation faster: with make, we only have to recompile the modified source files based on timestamps.
    We have to do the linking step every time, but it is much less expensive.
  • Executable files (no standard Linux extension).
    This is what the Linux kernel can actually run.
  • Shared object files (.so).
    Libraries meant to be loaded when the executable starts running.
  • Core dumps.
    Such files may be generated by the Linux kernel when the program does naughty things, e.g. segfault.
    They exist to help debugging the program.
In this tutorial, we consider only object and executable files.
  • Operating systems read and run ELF files.
    Kernels cannot link to a library nor use the C stlib, so they are more likely to implement it themselves.
    This is the case of the Linux kernel 4.2 which implements it in th file fs/binfmt_elf.c.
It is non-trivial to determine what is the smallest legal ELF file, or the smaller one that will do something trivial in Linux.
In this example we will consider a saner hello world example that will better capture real life cases.
Let's break down a minimal runnable Linux x86-64 example:
hello_world.asm
section .data
    hello_world db "Hello world!", 10
    hello_world_len  equ $ - hello_world
section .text
    global _start
    _start:
        mov rax, 1
        mov rdi, 1
        mov rsi, hello_world
        mov rdx, hello_world_len
        syscall
        mov rax, 60
        mov rdi, 0
        syscall
Compiled with:
nasm -w+all -f elf64 -o 'hello_world.o' 'hello_world.asm'
ld -o 'hello_world.out' 'hello_world.o'
TODO: use a minimal linker script with -T to be more precise and minimal.
Versions:
We don't use a C program as that would complicate the analysis, that will be level 2 :-)
Running:
hd hello_world.o
gives:
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  01 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|
00000020  00 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
00000030  00 00 00 00 40 00 00 00  00 00 40 00 07 00 03 00  |....@.....@.....|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  01 00 00 00 01 00 00 00  03 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 02 00 00 00 00 00 00  |................|
000000a0  0d 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  04 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000c0  07 00 00 00 01 00 00 00  06 00 00 00 00 00 00 00  |................|
000000d0  00 00 00 00 00 00 00 00  10 02 00 00 00 00 00 00  |................|
000000e0  27 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |'...............|
000000f0  10 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000100  0d 00 00 00 03 00 00 00  00 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00  40 02 00 00 00 00 00 00  |........@.......|
00000120  32 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |2...............|
00000130  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  17 00 00 00 02 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00  80 02 00 00 00 00 00 00  |................|
00000160  a8 00 00 00 00 00 00 00  05 00 00 00 06 00 00 00  |................|
00000170  04 00 00 00 00 00 00 00  18 00 00 00 00 00 00 00  |................|
00000180  1f 00 00 00 03 00 00 00  00 00 00 00 00 00 00 00  |................|
00000190  00 00 00 00 00 00 00 00  30 03 00 00 00 00 00 00  |........0.......|
000001a0  34 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |4...............|
000001b0  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001c0  27 00 00 00 04 00 00 00  00 00 00 00 00 00 00 00  |'...............|
000001d0  00 00 00 00 00 00 00 00  70 03 00 00 00 00 00 00  |........p.......|
000001e0  18 00 00 00 00 00 00 00  04 00 00 00 02 00 00 00  |................|
000001f0  04 00 00 00 00 00 00 00  18 00 00 00 00 00 00 00  |................|
00000200  48 65 6c 6c 6f 20 77 6f  72 6c 64 21 0a 00 00 00  |Hello world!....|
00000210  b8 01 00 00 00 bf 01 00  00 00 48 be 00 00 00 00  |..........H.....|
00000220  00 00 00 00 ba 0d 00 00  00 0f 05 b8 3c 00 00 00  |............<...|
00000230  bf 00 00 00 00 0f 05 00  00 00 00 00 00 00 00 00  |................|
00000240  00 2e 64 61 74 61 00 2e  74 65 78 74 00 2e 73 68  |..data..text..sh|
00000250  73 74 72 74 61 62 00 2e  73 79 6d 74 61 62 00 2e  |strtab..symtab..|
00000260  73 74 72 74 61 62 00 2e  72 65 6c 61 2e 74 65 78  |strtab..rela.tex|
00000270  74 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |t...............|
00000280  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000290  00 00 00 00 00 00 00 00  01 00 00 00 04 00 f1 ff  |................|
000002a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000002b0  00 00 00 00 03 00 01 00  00 00 00 00 00 00 00 00  |................|
000002c0  00 00 00 00 00 00 00 00  00 00 00 00 03 00 02 00  |................|
000002d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000002e0  11 00 00 00 00 00 01 00  00 00 00 00 00 00 00 00  |................|
000002f0  00 00 00 00 00 00 00 00  1d 00 00 00 00 00 f1 ff  |................|
00000300  0d 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000310  2d 00 00 00 10 00 02 00  00 00 00 00 00 00 00 00  |-...............|
00000320  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000330  00 68 65 6c 6c 6f 5f 77  6f 72 6c 64 2e 61 73 6d  |.hello_world.asm|
00000340  00 68 65 6c 6c 6f 5f 77  6f 72 6c 64 00 68 65 6c  |.hello_world.hel|
00000350  6c 6f 5f 77 6f 72 6c 64  5f 6c 65 6e 00 5f 73 74  |lo_world_len._st|
00000360  61 72 74 00 00 00 00 00  00 00 00 00 00 00 00 00  |art.............|
00000370  0c 00 00 00 00 00 00 00  01 00 00 00 02 00 00 00  |................|
00000380  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000390
We will get into more detail later, but it is good to have it in mind now:
  • section: exists before linking, in object files.
    One ore more sections will be put inside a single segment by the linker.
    Major information sections contain for the linker: is this section:
    • raw data to be loaded into memory, e.g. .data, .text, etc.
    • or metadata about other sections, that will be used by the linker, but disappear at runtime e.g. .symtab, .srttab, .rela.text
  • segment: exists after linking, in the executable file.
    Contains information about how each segment should be loaded into memory by the OS, notably location and permissions.
Array of Elf64_Shdr structs.
Each entry contains metadata about a given section.
e_shoff of the ELF header gives the starting position, 0x40 here.
e_shentsize and e_shnum from the ELF header say that we have 7 entries, each 0x40 bytes long.
So the table takes bytes from 0x40 to 0x40 + 7 + 0x40 - 1 = 0x1FF.
Some section names are reserved for certain section types: www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections e.g. .text requires a SHT_PROGBITS type and SHF_ALLOC + SHF_EXECINSTR
Running:
readelf -S hello_world.o
outputs:
There are 7 section headers, starting at offset 0x40:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .data             PROGBITS         0000000000000000  00000200
       000000000000000d  0000000000000000  WA       0     0     4
  [ 2] .text             PROGBITS         0000000000000000  00000210
       0000000000000027  0000000000000000  AX       0     0     16
  [ 3] .shstrtab         STRTAB           0000000000000000  00000240
       0000000000000032  0000000000000000           0     0     1
  [ 4] .symtab           SYMTAB           0000000000000000  00000280
       00000000000000a8  0000000000000018           5     6     4
  [ 5] .strtab           STRTAB           0000000000000000  00000330
       0000000000000034  0000000000000000           0     0     1
  [ 6] .rela.text        RELA             0000000000000000  00000370
       0000000000000018  0000000000000018           4     2     4
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
The struct represented by each entry is:
typedef struct {
    Elf64_Word  sh_name;
    Elf64_Word  sh_type;
    Elf64_Xword sh_flags;
    Elf64_Addr  sh_addr;
    Elf64_Off   sh_offset;
    Elf64_Xword sh_size;
    Elf64_Word  sh_link;
    Elf64_Word  sh_info;
    Elf64_Xword sh_addralign;
    Elf64_Xword sh_entsize;
} Elf64_Shdr;
Contained in bytes 0x40 to 0x7F.
If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), e_shnum has the value SHN_UNDEF (0) and the actual number of section header table entries is contained in the sh_size field of the section header at index 0 (otherwise, the sh_size member of the initial entry contains 0).
There are also other magic sections detailed in Figure 4-7: Special Section Indexes.
Sections with sh_type == SHT_STRTAB are called string tables.
They hold a null separated array of strings.
Such sections are used by other sections when string names are to be used. The using section says:
  • which string table they are using
  • what is the index on the target string table where the string starts
So for example, we could have a string table containing:
Data: \0 a b c \0 d e f \0
Index: 0 1 2 3  4 5 6 7  8
The first byte must be a 0. TODO rationale?
And if another section wants to use the string d e f, they have to point to index 5 of this section (letter d).
Notable string table sections:
  • .shstrtab
  • .strtab
Section type: sh_type == SHT_STRTAB.
Common name: "section header string table".
The section name .shstrtab is reserved. The standard says:
This section holds section names.
This section gets pointed to by the e_shstrnd field of the ELF header itself.
String indexes of this section are are pointed to by the sh_name field of section headers, which denote strings.
This section does not have SHF_ALLOC marked, so it will not appear on the executing program.
readelf -x .shstrtab hello_world.o
outputs:
Hex dump of section '.shstrtab':
  0x00000000 002e6461 7461002e 74657874 002e7368 ..data..text..sh
  0x00000010 73747274 6162002e 73796d74 6162002e strtab..symtab..
  0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex
  0x00000030 7400                                t.
If we look at the names of other sections, we see that they all contain numbers, e.g. the .text section is number 7.
Then each string ends when the first NUL character is found, e.g. character 12 is \0 just after .text\0.
Entry 1 has ELF64_R_TYPE == STT_FILE. ELF64_R_TYPE is continued inside of st_info.
  • 10 8: st_name = 01000000 = character 1 in the .strtab, which until the following \0 makes hello_world.asm
    This piece of information file may be used by the linker to decide on which segment sections go: e.g. in ld linker script we write:
    segment_name :
    {
        file(section)
    }
    to pick a section from a given file.
    Most of the time however, we will just dump all sections with a given name together with:
    segment_name :
    {
        *(section)
    }
  • 10 12: st_info = 04
    Bits 0-3 = ELF64_R_TYPE = Type = 4 = STT_FILE: the main purpose of this entry is to use st_name to indicate the name of the file which generated this object file.
    Bits 4-7 = ELF64_ST_BIND = Binding = 0 = STB_LOCAL. Required value for STT_FILE.
  • 10 13: st_shndx = Symbol Table Section header Index = f1ff = SHN_ABS. Required for STT_FILE.
  • 20 0: st_value = 8x 00: required for value for STT_FILE
  • 20 8: st_size = 8x 00: no allocated size
Now from the readelf, we interpret the others quickly.
There are two such entries, one pointing to .data and the other to .text (section indexes 1 and 2).
Num:    Value          Size Type    Bind   Vis      Ndx Name
  2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
  3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
TODO what is their purpose?
Then come the most important symbols:
Num:    Value          Size Type    Bind   Vis      Ndx Name
  4: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    1 hello_world
  5: 000000000000000d     0 NOTYPE  LOCAL  DEFAULT  ABS hello_world_len
  6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    2 _start
hello_world string is in the .data section (index 1). It's value is 0: it points to the first byte of that section.
_start is marked with GLOBAL visibility since we wrote:
global _start
in NASM. This is necessary since it must be seen as the entry point. Unlike in C, by default NASM labels are local.
hello_world_len points to the special st_shndx == SHN_ABS == 0xF1FF.
0xF1FF is chosen so as to not conflict with other sections.
st_value == 0xD == 13 which is the value we have stored there on the assembly: the length of the string Hello World!.
This means that relocation will not affect this value: it is a constant.
This is small optimization that our assembler does for us and which has ELF support.
If we had used the address of hello_world_len anywhere, the assembler would not have been able to mark it as SHN_ABS, and the linker would have extra relocation work on it later.
By default, NASM places a .symtab on the executable as well.
This is only used for debugging. Without the symbols, we are completely blind, and must reverse engineer everything.
You can strip it with objcopy, and the executable will still run. Such executables are called "stripped executables".
Holds strings for the symbol table.
This section has sh_type == SHT_STRTAB.
It is pointed to by sh_link == 5 of the .symtab section.
readelf -x .strtab hello_world.o
outputs:
Hex dump of section '.strtab':
  0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm
  0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel
  0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st
  0x00000030 61727400                            art.
This implies that it is an ELF level limitation that global variables cannot contain NUL characters.
Besides sh_type == SHT_RELA, there also exists SHT_REL, which would have section name .text.rel (not present in this object file).
Those represent the same struct, but without the addend, e.g.:
typedef struct {
    Elf64_Addr  r_offset;
    Elf64_Xword r_info;
} Elf64_Rela;
The ELF standard says that in many cases the both can be used, and it is just a matter of convenience.

Pinned article: Introduction to the OurBigBook Project

Welcome to the OurBigBook Project! Our goal is to create the perfect publishing platform for STEM subjects, and get university-level students to write the best free STEM tutorials ever.
Everyone is welcome to create an account and play with the site: ourbigbook.com/go/register. We belive that students themselves can write amazing tutorials, but teachers are welcome too. You can write about anything you want, it doesn't have to be STEM or even educational. Silly test content is very welcome and you won't be penalized in any way. Just keep it legal!
We have two killer features:
  1. topics: topics group articles by different users with the same title, e.g. here is the topic for the "Fundamental Theorem of Calculus" ourbigbook.com/go/topic/fundamental-theorem-of-calculus
    Articles of different users are sorted by upvote within each article page. This feature is a bit like:
    • a Wikipedia where each user can have their own version of each article
    • a Q&A website like Stack Overflow, where multiple people can give their views on a given topic, and the best ones are sorted by upvote. Except you don't need to wait for someone to ask first, and any topic goes, no matter how narrow or broad
    This feature makes it possible for readers to find better explanations of any topic created by other writers. And it allows writers to create an explanation in a place that readers might actually find it.
    Figure 1.
    Screenshot of the "Derivative" topic page
    . View it live at: ourbigbook.com/go/topic/derivative
  2. local editing: you can store all your personal knowledge base content locally in a plaintext markup format that can be edited locally and published either:
    This way you can be sure that even if OurBigBook.com were to go down one day (which we have no plans to do as it is quite cheap to host!), your content will still be perfectly readable as a static site.
    Figure 2.
    You can publish local OurBigBook lightweight markup files to either https://OurBigBook.com or as a static website
    .
    Figure 3.
    Visual Studio Code extension installation
    .
    Figure 4.
    Visual Studio Code extension tree navigation
    .
    Figure 5.
    Web editor
    . You can also edit articles on the Web editor without installing anything locally.
    Video 3.
    Edit locally and publish demo
    . Source. This shows editing OurBigBook Markup and publishing it using the Visual Studio Code extension.
    Video 4.
    OurBigBook Visual Studio Code extension editing and navigation demo
    . Source.
  3. https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png
  4. Infinitely deep tables of contents:
    Figure 6.
    Dynamic article tree with infinitely deep table of contents
    .
    Descendant pages can also show up as toplevel e.g.: ourbigbook.com/cirosantilli/chordate-subclade
All our software is open source and hosted at: github.com/ourbigbook/ourbigbook
Further documentation can be found at: docs.ourbigbook.com
Feel free to reach our to us for any help or suggestions: docs.ourbigbook.com/#contact