Had this happen on P14s on Ubuntu 23.10 while causally using Chromium. The screen went blank for a few seconds, but it apparently managed to reboot itself, and things started working again, except that and most windows were killed:
[drm:gfx_v11_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5774109, emitted seq=5774111
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chrome pid 14023 thread chrome:cs0 pid 14087
amdgpu 0000:64:00.0: amdgpu: GPU reset begin!
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[drm:gfx_v11_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
Dec 27 15:03:38 ciro-p14s kernel: amdgpu 0000:64:00.0: amdgpu: MODE2 reset
Dec 27 15:03:38 ciro-p14s kernel: amdgpu 0000:64:00.0: amdgpu: GPU reset succeeded, trying to resume
Dec 27 15:03:38 ciro-p14s kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000900
It appears to be a bug in the AMDGPU open source driver.
I think this was on Wayland. Possibly relatd but on X Window System, crashed the UI, showed message "oh no! Something has gone wrong."
2024-01-13_21-55-07@ciro@ciro-p14s$ cat /var/log/apport.log
ERROR: apport (pid 975172) 2024-01-13 21:41:02,087: host pid 3528 crashed in a separate mount namespace, ignoring
INFO: apport (pid 975227) 2024-01-13 21:41:02,398: called for pid 2728, signal 5, core limit 0, dump mode 1
INFO: apport (pid 975227) 2024-01-13 21:41:02,401: executable: /usr/bin/gnome-shell (command line "/usr/bin/gnome-shell")
INFO: apport (pid 975227) 2024-01-13 21:41:12,667: wrote report /var/crash/_usr_bin_gnome-shell.1000.crash
Happened on P14s on Ubuntu 23.10, which started with fresh Ubuntu 23.10 install.
However it did not happen on Lenovo ThinkPad P51 (2017) also on Ubuntu 23.10 which had been upgraded several times from God knows what starting point... At first one had X11 (forced by Nvidia drivers) and the other Wayland, but moving to p14s X11 changed nothing.
Both were running GNOME Display Manager.
Same happens with Super + L, but also CLI commands: askubuntu.com/questions/7776/how-do-i-lock-the-desktop-screen-via-command-line
Switching to the other installed kernel, 5.9 made boot work.
The solution on kernel 6.2 was:
sudo apt instal nvidia-driver-515
as per comments under: bugs.launchpad.net/ubuntu/+source/linux/+bug/2012559. This also made the nvidia driver work: Find GPU information in Ubuntu.
Previously I had:
nvidia-driver-510
and it blew up before reaching disk decryption.
I also tried:
nvidia-driver-525
but that broke in a different way:
Finished apport-autoreport.service - Process error reports when automatic reporting is enabled.
nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000947d:0:0:407
(1 of 2) Job systemd-backlight@backlight: nvidia_e.service/start running (32s no limit)
GDM crashes sometimes when switching windows right after opening a new window: bugs.launchpad.net/ubuntu/+source/gdm/+bug/1956299
Does not happen every time, only some times. Can't figure out why. Usually happens when has suspended for a longer time.
bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-470/+bug/1946303 sounds like a likely report, Nvidia driver version 470, but can't find those error messages anywhere. The last line of:
journalctl -o short-precise -k -b -1
once was:
PM: suspend entry (deep)
which is when sleep starts.
This suggests that it is not a video bug then, seems that it is not waking up at all? Gotta try to SSH into it. OK. I did SSH into it, and that was fine, so it is just the video that won't start.
PM: suspend exit
bugs.launchpad.net/ubuntu/+source/linux/+bug/1949977 is another possible bug, based on kernel version. I'm running 5.13, which is one of the failing versions on the report. Can't find any interesting dmesg though.
In another crash:
journalctl -o short-precise -k -b -1
had the following interesting lines:
nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[24307.640014] NVRM: GPU at PCI:0000:01:00: GPU-18af74bb-7c72-ff70-e447-87d48378ea20
[24307.640018] NVRM: Xid (PCI:0000:01:00): 79, pid=8828, GPU has fallen off the bus.
[24307.640021] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[24328.054022] nvidia-modeset: ERROR: GPU:0: The requested configuration of display devices (LGD (DP-4)) is not supported on this GPU.
[repeats several more times]
[24328.056767] nvidia-modeset: ERROR: GPU:0: The requested configuration of display devices (LGD (DP-4)) is not supported on this GPU.
[24328.056951] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[24328.056955] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:1:0:0x0000000f
[24328.056959] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:2:0:0x0000000f
[24328.056962] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:3:0:0x0000000f
[24328.056983] nvidia-modeset: ERROR: GPU:0: DP-4: Failed to disable DisplayPort audio stream-0
[24328.056992] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000947d:0:0:0x0000000f
and there was a corresponding /var/crash/_usr_sbin_gdm3.0.crash.