The problem with a single-level paging scheme is that it would take up too much RAM: 4G / 4K = 1M entries _per_ process.
If each entry is 4 bytes long, that would make 4M _per process_, which is too much even for a desktop computer: ps -A | wc -l says that I am running 244 processes right now, so that would take around 1GB of my RAM!
For this reason, x86 developers decided to use a multi-level scheme that reduces RAM usage.
The downside of this system is that is has a slightly higher access time, as we need to access RAM more times for each translation.