Kernel 5.16.1 resume from suspend fails; 5.15.x works fine

Resuming from suspend works fine on my machine using kernel 5.15.x (now 5.15.15.1).
But neither 5.16.0 or .1 will do so.
5.16.x: The machine suspends normally, but fails on resume when it stops with a blinking cursor in the top left of the monitor. No keyboard responses, e.g. ctrl+alt+x for terminal, ctrl+alt+delete does nothing. No ssh available. The machine is an AMD-5800x desktop. Journalctl of the boot ends with “…entered the suspend sleep state .” Searching for errors finds “nvidia-gpu … i2c timeout error e0000000, RTW errror halmac err dump efuse in suspend, kscreen.xcb.helper: Event Error: 147, [drm:drm_new_set_master] ERROR [nvidia-drm] [GPU ID 0x00002d00] Failed to grab modeset ownership.”

5.15.x: Successful suspend and resume using 5.15.15 shows the same nvidia-gpu i2c and modeset ownership errors, so we can exclude those I guess. The problem may be the wifi (RTW) which uses the 88x2bu module. I’ll rmod it and test.

Meanwhile, the 5.16 kernel suspends and resumes without problem on an AMD 4500 laptop.

Update: removing the rtl (88x2bu) kernel module did not solve the problem.

1 Like

Relax, stay on 5.15 and wait until 5.16 is fixed, it’s still a pretty new kernel.

Thanks. Any insights into the problem?
The issue remains with 5.16.2.

I think waiting will not solve this problem. I am facing the same issue albeit with an installation of Arch. I have tried a lot of stuff. Documented in the thread titled “System will not resume from suspend with Kernel 5.16” in the Kernel & Hardware section of the Arch forums. This forum doesn’t allow me to post a link.

I have and AMD FX-8350 with an Nvidia GTX 650 Ti. We may have something in common. Care to provide more detail about your hardware?

Thanks for the information. I saw your posts too in my searching for a fix.

The hardware:
AMD 5800x, Nvidia TU106 RTX 2060.
So the AMD CPU & Nvidia GPU combination is common.

The laptop on which suspend/resume works has AMD integrated graphics.

PS. Several posts on the LKML are discussing potential suspend/resume issues in ACPI with various hardware types.
PPS. 5.16.3 has >1000 changes with several mentioning suspend and resume

Thanks for replying. I am inclined to agree with you that the combination of AMD and Nvidia could be a contributor though I did see see a post in the Arch forums where the GPU was AMD. This could be related to another issue with the 5.16 kernel though. I think that the problem we are facing has to do with the power management changes mentioned in the release notes.

What’s kept me engaged in my search is the fact that I have an install of Endeavour OS on the same machine that continues to work well with 5.16.2. As you may now, this is an Arch-based distro that connects directly to Arch repositories. There is no delay in passing through updates.

I am using Gnome on my vanilla Arch install and XFCE on Endeavour. The only other notable difference that I can find is that Endeavour OS has disabled the handling of suspend and resume by systemd by reconfiguring logind.conf. It seems to be handled by XFCE power manager and I am not sure what goes on inside that but resume works just fine.

Will post back if I find more.

Thanks for the information. I just tried 5.16.3 and it’s still broken. I’m using KDE and X11 drivers.

5.16.4 same problem.

Weirder and weirder. Same machine, 2nd Manjaro installation on another drive, 5.16.4 suspends and resumes fine. My work system won’t resume on 5.16.x but will with 5.15.x. Something on my work system is out of spec.

Could be firmware related.

What’s the best way to report such issues to the Kernel developers?

Probably as mentioned here:

https://www.kernel.org/doc/html/v4.14/admin-guide/reporting-bugs.html

could you say more about that? Never mind. I updated the nvme firmware on the Samsung 980 pro 1TB drive and the same resume from suspend problem persists with 5.16.5.

The working 5.16 kernel is on a Samsung 970 evo plus 1TB. Kernel 5.16 is breaking some user space functionality. Beyond the resume from suspend on my work machine, I had to fiddle with a touchpad driver call to get my laptop to recognize it after a 5.16 update (.4 → .5) that had no issues with any 5.15 updates including .19.

I got the same problem on a hp envy x360 (amd 4500u with integrated graphics). Suspend and resume work fine on 5.15.x but fail on 5.16.

thanks. some day someone will figure this out.
Here’s a clue: 5.16 suspend enters the CPU sleep state without scanning other hardware.

Using journalctl to compare suspend messages in 5.15 and 5.16, the 5.16 suspend entries ends prematurely.
With 5.15, successful suspend and resume ends with
“░░ The system has now entered the suspend sleep state.
Feb 08 00:21:54 xxxxxx kernel: PM: suspend entry (deep)
Feb 08 00:21:54 xxxxxx: Filesystems sync: 0.017 seconds”
followed by 17 lines describing wpa and usb devices all within 1 second, e.g.
Feb 08 00:21:54 xxxxx wpa_supplicant[2200]: wlp42s0f1u6: CTRL-EVENT-DSCP-POLICY clear_all

then many hours later :
Feb 08 11:13:23 mlsm-51e kernel: Freezing user space processes … (elapsed 0.001 seconds) done.
and successful resume.

With 5.16 the last lines are:
The system has now entered the suspend sleep state.
Feb 07 14:17:46 xxxxxx kernel: PM: suspend entry (deep)
Feb 07 14:17:46 xxxxxx kernel: Filesystems sync: 0.023 seconds
No other devices are listed.

So maybe the problem starts with suspend, which seems to halt the CPU w/o saving the other needed states to RAM.

I installed 5.16.7-1 in addition to 5.15.21-1 to see if things have changed. Afraid not. However, this time I had the following additional lines in journal logs:

Feb 24 12:14:47 rsk4 root[1391]: SleepButton pressed
Feb 24 12:14:47 rsk4 systemd-logind[551]: Suspend key pressed short.
Feb 24 12:14:47 rsk4 systemd[1]: Reached target Sleep.
Feb 24 12:14:47 rsk4 systemd[1]: Starting System Suspend...
Feb 24 12:14:47 rsk4 systemd-sleep[1394]: Entering sleep state 'suspend'...
Feb 24 12:14:47 rsk4 kernel: PM: suspend entry (deep)
Feb 24 12:14:47 rsk4 kernel: Filesystems sync: 0.019 seconds
Feb 24 12:14:48 rsk4 root[1401]: ACPI group/action undefined: jack/lineout / LINEOUT
Feb 24 12:14:48 rsk4 root[1403]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Feb 24 12:14:48 rsk4 acpid[533]: client 592[0:0] has disconnected
Feb 24 12:14:48 rsk4 root[1406]: ACPI group/action undefined: jack/lineout / LINEOUT

I tried to search for the video out error and found a couple of threads but nothing recent enough to be related to the current issue (unless we have a regression).

Nvidia Forums

Arch Forums

Nothing else to report except that an install of Endeavour OS on the same machine continues to work flawlessly with 5.16.10-arch. And no, I have no clue why this is.

Thanks. I’m stymied. The same machine booting from another partition (but same efi root) works fine, as do all my other machines. It’s something I installed on the work machine that’s messing with suspend–not my user either. I noticed that /etc/systemd/system files differ, in that the working partition has no sleep or suspend entries, the “broken” one has a bunch of sleep, suspend, and hibernate .wants and .requires. I tried getting rid of them but then the nvidia card wouldn’t suspend and neither would the system. So a new clue, but I’m giving up and sticking with 5.15 …

Endeavour OS has the suspend and hibernate entries in logind remmed out. They have also put the same entries in logind.conf. I tried to replicate this setup in my Arch install with no effect. I suspect that suspend and resume is not being handled by systemd in the case where it works. I have no clue what is actually handling it as pm-utils is absent on both.

Despite swearing off this obsession I found another clue. When suspend/resume work, the journal log includes the lines,
“ACPI: PM: Saving platform NVS memory
ACPI: PM: Restoring platform NVS memory”

This is true when 5.15 and 5.16 work. When suspend/resume on 5.16 on work system fails, which it always does, the ACPI NVS region is registered but not used to save platform. I tried a bunch of kernel cmd line parameter changes and added nvidia to the mkcpio module list, but no go. Good think 5.15 is excellent.

On my Arch install, linux-lts results in hang on resume. That is a 5.15x kernel. I suspect that this will soon happen in manjaro as well. The fact that there is nothing in the logs is stopping me from creating a bug report (bugzilla or arch).

I don’t think so as there exists no linux-lts in Manjaro.