Regression in kernel 4.19.6 prevents my laptop to suspend

Suspend to RAM was working fine on my Clevo N130WU laptop (Kaby Lake R i5-8250U, that laptop is commonly known under different names, like KDE Slimbook II, PCSpecialist 13.3" Lafite III or Obsidian N130WU) with the 4.18 kernel series and up until 4.19.5, but broke with the 4.19.6 update.

The laptop does not go into suspend anymore, the screen goes black for a short while, then the lock screen comes back up. The only unusual that I could find in dmesg is something USB related:

dpm_run_callback(): usb_dev_suspend+0x0/0x10 returns -16
PM: Device usb1 failed to suspend async: error -16
PM: Some devices failed to suspend, or early wake event detected

Using lsusb or usb-devices does not reveal which device is identified as usb1 in dmesg (at least to me), but I assume it is the LTE modem (Huawei ME906s-158) which shows up as usb 1-2. Disabling XHC in /proc/acpi/wakeup (which was the fix for immediate wakeup from suspend on 4.13 and 4.14 kernels) did not change anything.

Something strange when running lsusb -v is the following second line shown for each device:

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Couldn’t open device, some information will be missing
Device Descriptor:
…

What I could find in the kernel changelog is:

commit cc8b329fef53c74a4abf98b0755b3832d572d6ce
Author: Mathias Nyman mathias.nyman@linux.intel.com
Date: Thu Nov 15 11:38:41 2018 +0200

usb: xhci: Prevent bus suspend if a port connect change or polling state is detected

commit 2f31a67f01a8beb22cae754c53522cb61a005750 upstream.

USB3 roothub might autosuspend before a plugged USB3 device is detected,
causing USB3 device enumeration failure.

USB3 devices don't show up as connected and enabled until USB3 link trainig
completes. On a fast booting platform with a slow USB3 link training the
link might reach the connected enabled state just as the bus is suspending.

If this device is discovered first time by the xhci_bus_suspend() routine
it will be put to U3 suspended state like the other ports which failed to
suspend earlier.

The hub thread will notice the connect change and resume the bus,
moving the port back to U0

This U0 -> U3 -> U0 transition right after being connected seems to be
too much for some devices, causing them to first go to SS.Inactive state,
and finally end up stuck in a polling state with reset asserted

Fix this by failing the bus suspend if a port has a connect change or is
in a polling state in xhci_bus_suspend().

Don't do any port changes until all ports are checked, buffer all port
changes and only write them in the end if suspend can proceed

Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

If that’s an integrated modem, have you tried disabling it in BIOS and see whether that changes anything?

Another idea would be to enable some usb quirks with the usbcore.quirks= boot parameter.
For that you need to know the vendor and device ID of the device from lsusb or dmesg.
The parameter would then look like this:
usbcore.quirks=vendorID:productID:quirkflag

A list of quirk flags can be found in the kernel documentation here:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt

USB_QUIRK_RESET_RESUME and USB_QUIRK_DISCONNECT_SUSPEND look like possible candidates.

Quirks for several known problematic USB devices are already in-built in the kernel by the way.

1 Like

It is a Mini PCIe modem module that can not be deactivated in BIOS.

USB_QUIRK_RESET_RESUME on its own does nothing.

USB_QUIRK_DISCONNECT_SUSPEND on its own makes the laptop suspend, but it wakes up again right after (same behavior as previously experienced on 4.13 and 4.14 kernels).

Is it possible to find out if built-in quirks have changed between 4.19.5 and 4.19.6?

Not sure which quirks else I should try in what combination.

We have two identical laptops by the way, one has the original Huawei ME906s-158 (12d1:15c3) and the other one the HP lt4132 (03f0:a31d), which is an identical clone (but different firmware). Both are not properly configured by usb_modeswitch out of the box but can be made to work with some additional work. Due to the different vendorIDs and productIDs both modems behave slightly different and obviously are effected differently by built-in kernel quirks (the lt4132 is right up again after resume, the ME906s-158 seems to crash after resume which had me restart ModemManager by a systemd unit previously -> that issue may go away now that I know about USB_QUIRK_DISCONNECT_SUSPEND, though the resume/wake problem has to be solved first).

So to recap:

  • Kernel 4.19.5 lets my laptop suspend just fine
  • Kernel 4.19.6 does not let my laptop suspend out of the box
  • Kernel 4.19.6 with USB_QUIRK_DISCONNECT_SUSPEND lets my laptop suspend, but it wakes up right after (which is the same behavior as with 4.13 and 4.14 kernels out of the box)

I have hope that a combination of quirks may be the solution, but which?

P.S. The workaround for the immediate wakeup after suspend was to disable XHC in /proc/acpi/wakeup but I really want to avoid that, because it somehow also disables wakeup by power-button, and then it gets very inconvenient when the laptop is connected to an external monitor + keyboard + mouse with lid closed during operation, as the only way to let it wake up is to open and close the lid.

Quirks code is in drivers/usb/core/quirks.c, and there is no difference at all between 4.19.5 and 4.19.6.

There are however differences between 4.19.4 and 4.19.6, but none seems to be related to your setup (Corsair K70 keyboard, Terminus Technology Hub and Raydium Touchscreen).

EDIT: what module is used by the problematic device?

Does 4.14 have the same issue?

Sometimes older LTS kernels are the way to go, unless there is some particular hardware support or added functionality you need.

I have just taken a look and found the following for the Huawei ME906s-158 (12d1:15c3) in the quirk list:

{ USB_DEVICE(0x12d1, 0x15c3), .driver_info =
			USB_QUIRK_DISCONNECT_SUSPEND }

This is awkward, because if the kernel applies that quirk automatically as it is in the built-in list, adding it as boot parameter disables it, right (otherwise adding the boot parameter should not make any difference if it is set by the built-in list anyway)?

The HP lt4132 (03f0:a31d) is not found in that quirks list, which should explain why both modules behave differently so far, which would let me assume that NOT applying USB_QUIRK_DISCONNECT_SUSPEND leads to the better result (meaning that it makes suspend possible, despite the laptop waking up again right after; this could also be the reason why the lt4132 keeps working after suspend and the ME906s-158 not).

If it’s not the quirks list that caused the change in behavior, something else around USB functionality must have changed.

cdc_ether or cdc_mbim and option (for USBtty). I have tried with both (cdc_ether and cdc_mbim) but that didn’t change anything.

Yes:

Each letter will change the built-in quirk; setting it if it is
clear and clearing it if it is set.

The catch with using an older kernel in that case is as follows:

Linux 4.19.6-1 [LTS] -> that’s the one that does not let my laptop go into suspend

Linux 4.19.5-1 -> that’s the one with which suspend is working properly (but not offered by Manjaro Kernel Manager, so I had to download and install it manually)

Linux 4.18.20-1 [EOL] -> suspend working, but EOL.

Linux 4.14.85-1 [LTS] -> suspend also not working properly, but
different problem with laptop going into suspend and then waking up
again shortly after (can be fixed by echo XHC > /proc/acpi/wakeup but then I lose wakeup capability by mouse/keyboard and power button event).

This has become more a challenge to solve the riddle now. If 4.19.5 was working and 4.19.6 is not although the built-in quirks list has not changed, what else?

Then my problem is not really caused or solved by the quirks, even though disabling USB_QUIRK_DISCONNECT_SUSPEND changes the behavior, because I have just checked the kernel source of 4.18.20 (which had suspend properly working on my laptop) and the quirks list has the same entry for 12d1:15c3 as kernel 4.19.6.

Damned!

Is there a way to switch off a device like that modem without having that ability in BIOS (besides physically removing it from the laptop)?

This would be just for testing, I don’t want to do without that modem as the laptop has an externally accessible SIM card slot for that kind of module and it’s really nice not having to tether when on the move.

You could try blacklisting the modules but that’s probably not enough.

Although I don’t know what else you can do, I must applaud your perseverance :+1:

1 Like

Then lets hope that @philm has some more ideas, because I certainly don’t know what else to try.

Maybe you could try with acpi_call (this link is just an example). I’m not sure if it will work though…

Unlikely, I would not even know how to get the correct parameters (the examples show only how to turn off discrete graphics cards). Switching off the device was just an idea to see if suspend works without it, but without easy method to test it and since I don’t want to physically remove the module and don’t want to omit the LTE module anyway, it would be more work than gain.

just an idea but maybe try stopping/disabling related systemd services and see if it changes the behavior and if it does then use create a systemd service to automatically disable on suspend and enable on wake similar to what you did with network manager?

1 Like

I brought this method up with the OP on the updates thread where this issue was first discussed. I don’t know why this method has not been investigated by the OP, as there are many examples on the forum that this method works in many cases.

I have many other posts on this topic if you search the forum.

https://forum.manjaro.org/t/kernel-4-19-0-3-not-network-after-suspending-gnome-edition/63544/2

https://forum.manjaro.org/t/wifi-adapter-tp-link-tl-wn823n-must-be-reconnected-for-it-to-work/52968/19

https://forum.manjaro.org/t/surface-pro-1796-wifi-not-resuming-after-suspend/48133/47

https://forum.manjaro.org/t/thinkpad-x230t-wont-suspend-under-kernel-419rc4-4-18-4-17-4-14-4-9/59798/21

Here are some external links with excellent systemd reference material:

The ArchWiki - systemd

Red Hat - systemd-targets

Red Hat - systemd unit files

Systemd manpage

2 Likes

Because this issue is kernel related and does not have anything to do with anything else (like systemd or udev). I don’t want to find a half-arsed workaround but a solution for the issue that is future proof and keeps the machine going also with future kernel upgrades.

There is just no doubt, it works with 4.19.5 but not with 4.19.6 with nothing else on the system changed, so it’s a kernel issue, and not any of the many other suspend/resume problems others are experiencing. So why would I waste my time going through numerous totally unrelated suspend/resume issues?

I know you mean well, but this just isn’t solvable following your recommendation.

Systemd runs your entire system these days. If you feel using a systemd service to solve a problem is a “half-arsed workaround” then basically that’s what your saying every modern distribution that uses systemd is. I don’t see why anyone would waste there time and effort helping you, when you simply dismiss out of hand a viable working solution because it’s not good enough for you.

Have fun finding a solution that isn’t a “half-arsed workaround”. Let us know when you do.

1 Like

i see your point, while you may be able to work around it with systemd, you want to find the root cause which in the long run would be beneficial to you and others by not needing the workaround in the first place. good luck, hope it works out for ya

I’m sorry, but I really just see no logic in trying to fiddle around with systemd for trying to fix a kernel issue.

I have already proven that it is a kernel issue, I have done a fresh installation, applied the updates, swapped between kernel 4.19.5 und 4.19.6 with everything else staying the same including systemd, so this is 100% a kernel matter, not systemd or anything else related.

You are trying to dismiss my problem as some other totally unrelated suspend/resume issue, which just isn’t the case here.

If it works with kernel 4.19.5 and doesn’t with 4.19.6 while nothing else has changed, the only way is to track down the responsible regression in the kernel and/or its modules. Unfortunately I am not a developer and my knowledge is too limited to come up with the solution myself.