Source: /cirosantilli/amazon-ec2-gpu

= Amazon EC2 GPU

As of December 2023, the cheapest instance with an <Nvidia GPU> is <g4nd.xlarge>, so let's try that out. In that instance, <lspci> contains:
00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
TODO meaning of "nd"? "n" presumably means <Nvidia>, but what is the "d"?

Be careful not to confuse it with <g4ad.xlarge>, which has an <AMD GPU> instead. TODO meaning of "ad"? "a" presumably means <AMD>, but what is the "d"?

Some documentation on which GPU is in each instance can seen at: ([archive]) with a list of which GPUs they have at that random point in time. Can the GPU ever change for a given instance name? Likely not. Also as of December 2023 the list is already outdated, e.g. P5 is now shown, though it is mentioned at:

When selecting the instance to launch, the GPU does not show anywhere apparently on the instance information page, it is so bad!

Also note that this instance has 4 vCPUs, so on a new account you must first make a customer support request to Amazon to increase your limit from the default of 0 to 4, see also:[], otherwise instance launch will fail with:
\Q[You have requested more vCPU capacity than your current vCPU limit of 0 allows for the instance bucket that the specified instance type belongs to. Please visit to request an adjustment to this limit.]

When starting up the instance, also select:
* image: <Ubuntu 22.04>
* storage size: 30 GB (maximum free tier allowance)
Once you finally managed to <SSH> into the instance, first we have to install drivers and reboot:
sudo apt update
sudo apt install nvidia-driver-510 nvidia-utils-510 nvidia-cuda-toolkit
sudo reboot
and now running:
shows something like:
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   25C    P8    12W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

If we start from the raw <Ubuntu 22.04>, first we have to install drivers:
* official docs

From basically everything should just work as normal. E.g. we were able to run a <CUDA hello world> just fine along:

One issue with this setup, besides the time it takes to setup, is that you might also have to pay some network charges as it downloads a bunch of stuff into the instance. We should try out some of the pre-built images. But it is also good to know this pristine setup just in case.

Some stuff we then managed to run:
curl | sh
/bin/time ollama run llama2 'What is quantum field theory?'
which gave:
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
so way faster than on my local desktop <CPU>, hurray.

After setup from: we were able to run:
head -n1000 pap.txt | ARGOS_DEVICE_TYPE=cuda time argos-translate --from-lang en --to-lang fr > pap-fr.txt
which gave:
77.95user 2.87system 0:39.93elapsed 202%CPU (0avgtext+0avgdata 4345988maxresident)k
0inputs+88outputs (0major+910748minor)pagefaults 0swaps
so only marginally better than on <ciro santilli s hardware/P14s>. It would be fun to see how much faster we could make things on a more powerful GPU.