As of December 2023, the cheapest instance with an Nvidia GPU is g4nd.xlarge, so let's try that out. In that instance, lspci contains:TODO meaning of "nd"? "n" presumably means Nvidia, but what is the "d"?
00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Be careful not to confuse it with g4ad.xlarge, which has an AMD GPU instead. TODO meaning of "ad"? "a" presumably means AMD, but what is the "d"?
Some documentation on which GPU is in each instance can seen at: docs.aws.amazon.com/dlami/latest/devguide/gpu.html (archive) with a list of which GPUs they have at that random point in time. Can the GPU ever change for a given instance name? Likely not. Also as of December 2023 the list is already outdated, e.g. P5 is now shown, though it is mentioned at: aws.amazon.com/ec2/instance-types/p5/
When selecting the instance to launch, the GPU does not show anywhere apparently on the instance information page, it is so bad!
Also note that this instance has 4 vCPUs, so on a new account you must first make a customer support request to Amazon to increase your limit from the default of 0 to 4, see also: stackoverflow.com/questions/68347900/you-have-requested-more-vcpu-capacity-than-your-current-vcpu-limit-of-0, otherwise instance launch will fail with:
You have requested more vCPU capacity than your current vCPU limit of 0 allows for the instance bucket that the specified instance type belongs to. Please visit aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.
When starting up the instance, also select:Once you finally managed to SSH into the instance, first we have to install drivers and reboot:and now running:shows something like:
- image: Ubuntu 22.04
- storage size: 30 GB (maximum free tier allowance)
sudo apt update
sudo apt install nvidia-driver-510 nvidia-utils-510 nvidia-cuda-toolkit
sudo reboot
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 25C P8 12W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
If we start from the raw Ubuntu 22.04, first we have to install drivers:
- docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html official docs
- stackoverflow.com/questions/63689325/how-to-activate-the-use-of-a-gpu-on-aws-ec2-instance
- askubuntu.com/questions/1109662/how-do-i-install-cuda-on-an-ec2-ubuntu-18-04-instance
- askubuntu.com/questions/1397934/how-to-install-nvidia-cuda-driver-on-aws-ec2-instance
From basically everything should just work as normal. E.g. we were able to run a CUDA hello world just fine along:
nvcc inc.cu
./a.out
One issue with this setup, besides the time it takes to setup, is that you might also have to pay some network charges as it downloads a bunch of stuff into the instance. We should try out some of the pre-built images. But it is also good to know this pristine setup just in case.
Some stuff we then managed to run:which gave:so way faster than on my local desktop CPU, hurray.
curl https://ollama.ai/install.sh | sh
/bin/time ollama run llama2 'What is quantum field theory?'
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
After setup from: askubuntu.com/a/1309774/52975 we were able to run:which gave:so only marginally better than on P14s. It would be fun to see how much faster we could make things on a more powerful GPU.
head -n1000 pap.txt | ARGOS_DEVICE_TYPE=cuda time argos-translate --from-lang en --to-lang fr > pap-fr.txt
77.95user 2.87system 0:39.93elapsed 202%CPU (0avgtext+0avgdata 4345988maxresident)k
0inputs+88outputs (0major+910748minor)pagefaults 0swaps
- battery life:
- 2023-04: on-browser streaming + light browsing on Ubuntu 22.10: about 2h45. Too low! Gotta try buying a new battery.
- 2022-01-04 updated firmward after noticing that ubuntu 21.10 does not wake up from suspend seemed to happen every time when not connected to external power.
dmidecode
diff excerpt:used the "Ubuntu Software" GUI as mentioned at: support.lenovo.com/gb/en/solutions/ht510810-how-to-do-software-updates-linux. Kudos for making this accessible to newbs.BIOS Information Vendor: LENOVO - Version: N1UET40W (1.14 ) - Release Date: 09/28/2017 + Version: N1UET71W (1.45 ) + Release Date: 07/18/2018
After doing that, another update became available to: 0.1.56, clicked it and was much faster than the previous one, and didn't auto reboot. After manual reboot,dmidecode
diffed again:plus a bunch of other lines.BIOS Information Vendor: LENOVO - Version: N1UET71W (1.45 ) - Release Date: 07/18/2018 + Version: N1UET82W (1.56 ) + Release Date: 08/12/2021
- 2021-06-05 upgraded to Ubuntu 21.04 with a clean install from an ISO. SelectedAfter this, the GUI felt fast, who would have thought that erasing a bunch of stuff would make the system faster!
- "Minimal installation"
- "Erase disk and install Ubuntu". Notably, this erased the Microsoft Windows that came with the computer and was never used not even once
- "Erase disk ans use ZFS"
- Encrypt the new Ubuntu installation for security
lsblk
contains:andzd0 230:0 0 500M 0 disk └─keystore-rpool 253:0 0 484M 0 crypt /run/keystore/rpool nvme0n1 259:0 0 476.9G 0 disk ├─nvme0n1p1 259:1 0 512M 0 part /boot/efi ├─nvme0n1p2 259:2 0 2G 0 part │ └─cryptoswap 253:1 0 2G 0 crypt ├─nvme0n1p3 259:3 0 2G 0 part └─nvme0n1p4 259:4 0 472.4G 0 part
lsblk -f
:zd0 crypto_LUKS 2 └─keystore-rpool ext4 1.0 keystore-rpool nvme0n1 ├─nvme0n1p1 vfat FAT32 ├─nvme0n1p2 crypto_LUKS 2 │ └─cryptoswap ├─nvme0n1p3 zfs_member 5000 bpool └─nvme0n1p4 zfs_member 5000 rpoo
Then:contains:grep '[rb]pool' /proc/mounts
which gives an idea of how the above map to mountpoints.rpool/ROOT/ubuntu_uvs1fq / zfs rw,relatime,xattr,posixacl 0 0 rpool/USERDATA/ciro_czngbg /home/ciro zfs rw,relatime,xattr,posixacl 0 0 rpool/USERDATA/root_czngbg /root zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/srv /srv zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/usr/local /usr/local zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/games /var/games zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/log /var/log zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/lib /var/lib zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/mail /var/mail zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/snap /var/snap zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/www /var/www zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/spool /var/spool zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/lib/AccountsService /var/lib/AccountsService zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/lib/NetworkManager /var/lib/NetworkManager zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/lib/apt /var/lib/apt zfs rw,relatime,xattr,posixacl 0 0 rpool/ROOT/ubuntu_uvs1fq/var/lib/dpkg /var/lib/dpkg zfs rw,relatime,xattr,posixacl 0 0 bpool/BOOT/ubuntu_uvs1fq /boot zfs rw,nodev,relatime,xattr,posixacl 0 0
Had two GUI freezes since installation, a fixed images shows no matter what I do, possibly graphics only, but impossible to tell (next time I'll try SSH access). No Nvidia drivers installed yet.
2020-06-06: dropped some lemon juice on the bottom left of touchpad. Bottom left button not working anymore... I'm an idiot. There are many other alternatives, but very aggravating, I'll replace it for sure. Can't find the exact replacement part or any videos showing its replacement online easliy, dang. For the T430: www.youtube.com/watch?v=F3lzV9uXRjU Asked at: forums.lenovo.com/t5/ThinkPad-P-and-W-Series-Mobile-Workstations/P51-left-bottom-button-below-trackpad-mouse-left-click-stopped-working-possible-to-replace/m-p/5019903 Also I could not access it because you need to remove the HDD first: www.youtube.com/watch?v=5Klawxc7T_Y and I can't pull it out even with considerable force, unlike in the video... And OMG, those button caps are impossible to re-install once removed!!! Then when I put the whole thing back together, the upper buttons were not working anymore. FUUUUUUUUCK. When first opening I pulled on it without properly removing the cap and it came off, but it didn't look broken in any way and I put it back in. Keyboard works thank God, so right black connector is fine, left white one oppears to be the one for upper keys and trackpoint, both of which stopped working. The hardware manual confirms that they are both part of the same device, so basically a mouse :-) TODO can it be bought separately from te keyboard? Doesn't look like it, photo of keyboard part includes those buttons. The manual also confirms that the bottom buttons are one device with the trackpad "trackpad with buttons", thus forming the second entire mouse.
2019-04-17: popup asking about "ThinkPad P51 Management Engine Update" from from 182.29.3287 to 184.60.3561, said yes.
Ubuntu 17.10 setup after buying it:
- partition setup: askubuntu.com/questions/343268/how-to-use-manual-partitioning-during-installation/976430#976430
- BIOS:
- for NVIDIA driver:
- for KVM, required by Android Emulator: enable virtualization extensions
- TODO fix the brightness keys:
Battery life shown by Ubuntu battery app after installation:
- before NVIDIA driver setup: 8h
- after: 6.5h
GPU accelerated, simulates the Craig's minimized M. genitalium, JCVI-syn3A at a particle basis of some kind.
Lab head is the cutest-looking lady ever: chemistry.illinois.edu/zan, Zaida (Zan) Luthey-Schulten.
- 2022 paper: www.cell.com/cell/fulltext/S0092-8674(21)01488-4 Fundamental behaviors emerge from simulations of a living minimal cell by Thornburg et al. (2022) published on Cell
- faculty.scs.illinois.edu/schulten/lm/ actual source code. No Version control and non-code drop release, openess and best practices haven't reached such far obscure reaches of academia yet. One day.
- blogs.nvidia.com/blog/2022/01/20/living-cell-simulation/ Nvidia announcement. That's how they do business, it is quite interesting how they highlight this kind of research.
- catalog.ngc.nvidia.com/orgs/hpc/containers/lattice-microbes has a container
This company is a bit like Sun Microsystems, you can hear a note of awe in the voice of those who knew it at its peak. This was a bit before Ciro Santilli's awakening.
Both of them and Sun kind of died in the same way, unable to move from the workstation to the personal computer fast enough, and just got killed by the scale of competitors who did, notably Nvidia for graphics cards.
Some/all Nintendo 64 games were developed on it, e.g. it is well known that this was the case for Super Mario 64.
Also they were a big UNIX vendor, which is another kudos to the company.
A network interface controller that does more than just the base OSI model protocols, notably in a programmable way.
- www.nextplatform.com/2022/05/11/intel-unrolls-dpu-roadmap-with-a-two-year-cadence/
- www.trentonsystems.com/blog/what-is-a-smartnic
- blogs.nvidia.com/blog/2021/10/29/what-is-a-smartnic/ "Some are using FPGAs which promise flexibility"
- www.servethehome.com/intel-ipu-exotic-answer-to-industry-dpu/ "Intel IPU is an Exotic Answer to the Industry DPU"
- 2022 www.datacenterdynamics.com/en/news/amd-to-buy-smartnic-firm-pensando-for-19-billion/ "AMD to buy SmartNIC firm Pensando for $1.9 billion"
- www.theregister.com/2022/06/14/alibaba_dpu_cloud/ mentions that Alibaba Cloud created their own.
I've finally had enough of Nvidia breaking my Ubuntu 21.10 suspend, so I investigated some more and found a workaround on the NVIDIA forums: stackoverflow.com/questions/58233482/next-js-setting-up-eslint-for-nextjs/70519682#70519682.
Thanks enormously to heroic user humblebee, and once again, Nvidia, fuck you.
Please refer to Video "Linus Torvalds saying "Nvidia Fuck You" (2012)".
Does not happen every time, only some times. Can't figure out why. Usually happens when has suspended for a longer time.
bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-470/+bug/1946303 sounds like a likely report, Nvidia driver version 470, but can't find those error messages anywhere. The last line of:once was:which is when sleep starts.
journalctl -o short-precise -k -b -1
PM: suspend entry (deep)
This suggests that it is not a video bug then, seems that it is not waking up at all? Gotta try to SSH into it. OK. I did SSH into it, and that was fine, so it is just the video that won't start.
PM: suspend exit
bugs.launchpad.net/ubuntu/+source/linux/+bug/1949977 is another possible bug, based on kernel version. I'm running 5.13, which is one of the failing versions on the report. Can't find any interesting dmesg though.
In another crash:had the following interesting lines:and there was a corresponding
journalctl -o short-precise -k -b -1
nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[24307.640014] NVRM: GPU at PCI:0000:01:00: GPU-18af74bb-7c72-ff70-e447-87d48378ea20
[24307.640018] NVRM: Xid (PCI:0000:01:00): 79, pid=8828, GPU has fallen off the bus.
[24307.640021] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[24328.054022] nvidia-modeset: ERROR: GPU:0: The requested configuration of display devices (LGD (DP-4)) is not supported on this GPU.
[repeats several more times]
[24328.056767] nvidia-modeset: ERROR: GPU:0: The requested configuration of display devices (LGD (DP-4)) is not supported on this GPU.
[24328.056951] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[24328.056955] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:1:0:0x0000000f
[24328.056959] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:2:0:0x0000000f
[24328.056962] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:3:0:0x0000000f
[24328.056983] nvidia-modeset: ERROR: GPU:0: DP-4: Failed to disable DisplayPort audio stream-0
[24328.056992] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000947d:0:0:0x0000000f
/var/crash/_usr_sbin_gdm3.0.crash
.Related "GPU has fallen off the bus": askubuntu.com/questions/868321/gpu-has-fallen-off-the-bus-nvidia
Happened on P14s on Ubuntu 23.10, which started with fresh Ubuntu 23.10 install.
However it did not happen on Lenovo ThinkPad P51 (2017) also on Ubuntu 23.10 which had been upgraded several times from God knows what starting point... At first one had X11 (forced by Nvidia drivers) and the other Wayland, but moving to p14s X11 changed nothing.
Both were running GNOME Display Manager.
Same happens with Super + L, but also CLI commands: askubuntu.com/questions/7776/how-do-i-lock-the-desktop-screen-via-command-line
Bibliography:
- askubuntu.com/questions/1242110/after-upgrading-to-ubuntu-20-04-lockscreen-not-working canon
- askubuntu.com/questions/1246622/ubuntu-20-04-unable-to-lock-screen
- askubuntu.com/questions/1245071/cant-lock-screen-with-shortcut-on-ubuntu-20-04-gnome
- askubuntu.com/questions/1248756/super-l-not-working-on-ubuntu-20-04