The Linux kernel makes extensive usage of the paging features of x86 to allow fast process switches with small data fragmentation.
There are also however some features that the Linux kernel might not use, either because they are only for backwards compatibility, or because the Linux devs didn't feel it was worth it yet.
Convert virtual addresses to physical from user space with
/proc/<pid>/pagemap
and from kernel space with virt_to_phys
:Dump all page tables from userspace with
/proc/<pid>/maps
and /proc/<pid>/pagemap
:Read and write physical addresses from userspace with
/dev/mem
:The Linux Kernel reserves two zones of virtual memory:
- one for kernel memory
- one for programs
The exact split is configured by
CONFIG_VMSPLIT_...
. By default:- on 32-bit:
- the bottom 3/4 is program space:
00000000
toBFFFFFFF
- the top 1/4 is kernel memory:
C0000000
toFFFFFFFF
, like this:------------------ FFFFFFFF Kernel ------------------ C0000000 ------------------ BFFFFFFF Process ------------------ 00000000
- the bottom 3/4 is program space:
- on 64-bit: currently only 48-bits are actually used, split into two equally sized disjoint spaces. The Linux kernel just assigns:
- the bottom part to processes
00000000 00000000
to008FFFFF FFFFFFFF
- the top part to the kernel:
FFFF8000 00000000
toFFFFFFFF FFFFFFFF
, like this:------------------ FFFFFFFF Kernel ------------------ C0000000 (not addressable) ------------------ BFFFFFFF Process ------------------ 00000000
- the bottom part to processes
Kernel memory is also paged.
In previous versions, the paging was continuous, but with HIGHMEM this changed.
There is no clear physical memory split: stackoverflow.com/questions/30471742/physical-memory-userspace-kernel-split-on-linux-x86-64
For each process, the virtual address space looks like this:
------------------ 2^32 - 1
Stack (grows down)
v v v v v v v v v
------------------
(unmapped)
------------------ Maximum stack size.
(unmapped)
-------------------
mmap
-------------------
(unmapped)
-------------------
^^^^^^^^^^^^^^^^^^^
brk (grows up)
-------------------
BSS
-------------------
Data
-------------------
Text
-------------------
------------------- 0
The kernel maintains a list of pages that belong to each process, and synchronizes that with the paging.
If the program accesses memory that does not belong to it, the kernel handles a page-fault, and decides what to do:
- if it is above the maximum stack size, allocate those pages to the process
- otherwise, send a SIGSEGV to the process, which usually kills it
When an ELF file is loaded by the kernel to start a program with the
exec
system call, the kernel automatically registers text, data, BSS and stack for the program.The
brk
and mmap
areas can be modified by request of the program through the brk
and mmap
system calls. But the kernel can also deny the program those areas if there is not enough memory.brk
and mmap
can be used to implement malloc
, or the so called "heap".mmap
is also used to load dynamically loaded libraries into the program's memory so that it can access and run it.Stack allocation: stackoverflow.com/questions/17671423/stack-allocation-for-process
Calculating exact addresses Things are complicated by:
- Address Space Layout Randomization.
- the fact that environment variables, CLI arguments, and some ELF header data take up initial stack space: unix.stackexchange.com/questions/145557/how-does-stack-allocation-work-in-linux/239323#239323
Why the text does not start at 0: stackoverflow.com/questions/14795164/why-do-linux-program-text-sections-start-at-0x0804800-and-stack-tops-start-at-0
Besides a missing page, a very common source of page faults is copy-on-write (COW).
Page tables have extra flags that allow the OS to mark a page a read-only.
Those page faults only happen when a process tries to write to the page, and not read from it.
When Linux forks a process:
- instead of copying all the pages, which is unnecessarily costly, it makes the page tables of the two process point to the same physical address.
- it marks those linear addresses as read-only
- whenever one of the processes tries to write to a page, the makes a copy of the physical memory, and updates the pages of the two process to point to the two different physical addresses
In
v4.2
, look under arch/x86/
:include/asm/pgtable*
include/asm/page*
mm/pgtable*
mm/page*
There seems to be no structs defined to represent the pages, only macros:
include/asm/page_types.h
is specially interesting. Excerpt:#define _PAGE_BIT_PRESENT 0 /* is present */
#define _PAGE_BIT_RW 1 /* writeable */
#define _PAGE_BIT_USER 2 /* userspace addressable */
#define _PAGE_BIT_PWT 3 /* page write through */
arch/x86/include/uapi/asm/processor-flags.h
defines CR0
, and in particular the PG
bit position:#define X86_CR0_PG_BIT 31 /* Paging */
Articles by others on the same topic
There are currently no matching articles.