Hi folks,
I’m experiencing inconsistent suspend behavior on my Lenovo Legion 5 running Manjaro (Kernel 6.12.34-1). My laptop uses an AMD Ryzen 7 5800H CPU and an Nvidia RTX 3070 GPU (hybrid graphics disabled, so only Nvidia is active).
Most of the time, suspend via Gnome works perfectly. Occasionally (about 10% of the time), when I try to suspend, the screen goes black (no video output), but the backlight stays on. The system doesn’t suspend and becomes unresponsive, requiring a forced restart.
I tried looking into the journalctl
entries around the time of the issue and I found some errors:
Jul 03 21:12:16 joao-82ju systemd-coredump[52430]: [🡕] Process 2574 (gnome-shell) of user 1000 dumped core.
Stack trace of thread 2574:
#0 0x00007efd5834633e st_theme_node_lookup_shadow (libst-16.so + 0x4e33e)
#1 0x00007efd5834683b st_theme_node_get_box_shadow (libst-16.so + 0x4e83b)
#2 0x00007efd5834a7c6 st_theme_node_get_paint_box (libst-16.so + 0x527c6)
...
Jul 03 21:12:38 joao-82ju kernel: Freezing user space processes
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (pKernelBus->pReadToFlush != NULL || pKernelBus->virtualBar2[GPU_GFID_PF].pCpuMapping != NULL) @ kern_>
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: pEntries != NULL @ gmmu_walk.c:881
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: progress == indexHi_tmp - indexLo_tmp + 1 @ mmu_walk.c:1092
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ mmu_walk.c:488
Jul 03 21:12:38 joao-82ju kernel: NVRM: mmuWalkSparsify: Failed to sparsify VA Range 0xaa0000 to 0xb1ffff. Status = 0x00000040
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk_sparse.c:74
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: pEntries != NULL @ gmmu_walk.c:881
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: progress == indexHi_tmp - indexLo_tmp + 1 @ mmu_walk.c:1092
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ mmu_walk.c:488
Jul 03 21:12:38 joao-82ju kernel: NVRM: mmuWalkUnmap: Failed to unmap VA Range 0xaa0000 to 0xb1ffff. Status = 0x00000040
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk_unmap.c:65
Jul 03 21:12:38 joao-82ju kernel: NVRM: mmuWalkSparsify: Unmap failed with status = 0x00000040
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == unmapStatus @ mmu_walk_sparse.c:85
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from mmuWalkSparsify(userC>
Jul 03 21:12:38 joao-82ju kernel: NVRM: nvAssertFailedNoLog: Assertion failed: (pKernelBus->pReadToFlush != NULL || pKernelBus->virtualBar2[GPU_GFID_PF].pCpuMapping != NULL) @ kern_>
Jul 03 21:12:38 joao-82ju kernel: Freezing user space processes failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
...
Jul 03 21:12:59 joao-82ju systemd-sleep[52519]: Failed to put system to sleep. System resumed again: Device or resource busy
...
Jul 03 21:14:31 joao-82ju gnome-session-binary[52809]: Unrecoverable failure in required component org.gnome.Shell.desktop
...
Jul 03 21:16:56 joao-82ju gdm-launch-environment][52896]: pam_systemd(gdm-launch-environment:session): Failed to create session: Connection timed out
Jul 03 21:17:00 joao-82ju kernel: INFO: task nv_queue:448 blocked for more than 122 seconds.
Jul 03 21:17:00 joao-82ju kernel: Tainted: G OE 6.12.34-1-MANJARO #1
Jul 03 21:17:00 joao-82ju kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
Jul 03 21:39:54 joao-82ju gdm-launch-environment][54102]: pam_systemd(gdm-launch-environment:session): Failed to create session: Connection timed out
...
Jul 03 21:39:57 joao-82ju kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] _ERROR_ [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
Jul 03 21:40:00 joao-82ju kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] _ERROR_ [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
Jul 03 21:40:03 joao-82ju kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] _ERROR_ [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
Notes:
- I’m using GNOME Shell version: 48.2 (on Wayland)
- Nvidia driver version: 575.64 (proprietary Nvidia driver)
Based on the logs, I think there was an error in the GPU during the attempt to suspend and the operating system didn’t find a way to recover from it.
My question is: how can I confirm what is causing the issue? and how could I make the suspend function more resilient (or maybe if there is anything I could do to recover from the error manually).