Using the TLB makes translation faster, because the initial translation takes one access _per TLB level_, which means 2 on a simple 32 bit scheme, but 3 or 4 on 64 bit architectures.
The TLB is usually implemented as an expensive type of RAM called content-addressable memory (CAM). CAM implements an associative map on hardware, that is, a structure that given a key (linear address), retrieves a value.
Mappings could also be implemented on RAM addresses, but CAM mappings may required much less entries than a RAM mapping.
For example, a map in which:
  • both keys and values have 20 bits (the case of a simple paging schemes)
  • at most 4 values need to be stored at each time
could be stored in a TLB with 4 entries:
linear  physical
------  --------
00000   00001
00001   00010
00010   00011
FFFFF   00000
However, to implement this with RAM, _it would be necessary to have 2^20 addresses_:
linear  physical
------  --------
00000   00001
00001   00010
00010   00011
... (from 00011 to FFFFE)
FFFFF   00000
which would be even more expensive than using a TLB.