The Translation Lookahead Buffer (TLB) is a cache for paging addresses.
Since it is a cache, it shares many of the design issues of the CPU cache, such as associativity level.
This section shall describe a simplified fully associative TLB with 4 single address entries. Note that like other caches, real TLBs are not usually fully associative.
After a translation between linear and physical address happens, it is stored on the TLB. For example, a 4 entry TLB starts in the following state:
valid linear physical
----- ------ --------
> 0 00000 00000
0 00000 00000
0 00000 00000
0 00000 00000
The
>
indicates the current entry to be replaced.And after a page linear address
and after a second translation of
00003
is translated to a physical address 00005
, the TLB becomes:
valid linear physical
----- ------ --------
1 00003 00005
> 0 00000 00000
0 00000 00000
0 00000 00000
00007
to 00009
it becomes:
valid linear physical
----- ------ --------
1 00003 00005
1 00007 00009
> 0 00000 00000
0 00000 00000
Now if
00003
needs to be translated again, hardware first looks up the TLB and finds out its address with a single RAM access 00003 --> 00005
.Of course,
00000
is not on the TLB since no valid entry contains 00000
as a key.When TLB is filled up, older addresses are overwritten. Just like CPU cache, the replacement policy is a potentially complex operation, but a simple and reasonable heuristic is to remove the least recently used entry (LRU).
With LRU, starting from state:
adding
valid linear physical
----- ------ --------
> 1 00003 00005
1 00007 00009
1 00009 00001
1 0000B 00003
0000D -> 0000A
would give:
valid linear physical
----- ------ --------
1 0000D 0000A
> 1 00007 00009
1 00009 00001
1 0000B 00003
Using the TLB makes translation faster, because the initial translation takes one access _per TLB level_, which means 2 on a simple 32 bit scheme, but 3 or 4 on 64 bit architectures.
The TLB is usually implemented as an expensive type of RAM called content-addressable memory (CAM). CAM implements an associative map on hardware, that is, a structure that given a key (linear address), retrieves a value.
Mappings could also be implemented on RAM addresses, but CAM mappings may required much less entries than a RAM mapping.
For example, a map in which:could be stored in a TLB with 4 entries:
- both keys and values have 20 bits (the case of a simple paging schemes)
- at most 4 values need to be stored at each time
linear physical
------ --------
00000 00001
00001 00010
00010 00011
FFFFF 00000
However, to implement this with RAM, _it would be necessary to have 2^20 addresses_:
which would be even more expensive than using a TLB.
linear physical
------ --------
00000 00001
00001 00010
00010 00011
... (from 00011 to FFFFE)
FFFFF 00000
When the process changes,
cr3
change to point to the page table of the new current process.This creates a problem: the TLB is now filled with a bunch of cached entries for the old process.
A simple and naive solution would be to completely invalidate the TLB whenever the
cr3
changes.However, this is would not be very efficient, because it often happens that we switch back to process 1 before process 2 has completely used up the entire TLB cache entries.
The solution for this is to use so called "Address Space Identifiers" (ASID) as mentioned in sources such as:
Basically, the OS assigns a different ASID for each process, and then TLB entries are automatically also tagged with that ASID. This way when the process makes an access, the TLB can determine if a hit is actually for the current process, or if it is an old address coincidence with another process.
The x86 also offers the
invlpg
instruction which explicitly invalidates a single TLB entry. Other architectures offer even more instructions to invalidated TLB entries, such as invalidating all entries on a given range.Articles by others on the same topic
There are currently no matching articles.