Using the TLB makes translation faster, because the initial translation takes one access _per TLB level_, which means 2 on a simple 32 bit scheme, but 3 or 4 on 64 bit architectures.
The TLB is usually implemented as an expensive type of RAM called content-addressable memory (CAM). CAM implements an associative map on hardware, that is, a structure that given a key (linear address), retrieves a value.
Mappings could also be implemented on RAM addresses, but CAM mappings may required much less entries than a RAM mapping.
For example, a map in which:could be stored in a TLB with 4 entries:
- both keys and values have 20 bits (the case of a simple paging schemes)
- at most 4 values need to be stored at each time
linear physical
------ --------
00000 00001
00001 00010
00010 00011
FFFFF 00000
However, to implement this with RAM, _it would be necessary to have 2^20 addresses_:
which would be even more expensive than using a TLB.
linear physical
------ --------
00000 00001
00001 00010
00010 00011
... (from 00011 to FFFFE)
FFFFF 00000