Reboot and shutdown issues with recent kernels

For a few weeks now, it has been impossible to complete the reboot or shutdown of an AMD Ryzen 7 5800H computer with Radeon Vega 8 iGPU with the official Manjaro kernels (I have Manjaro kernels 6.6.10-1 and 6.7.0rc8-1 installed right now, and it happens in both the same problem). With both kernels, the computer’s shutdown is never completed, the screen remains black for a while and then it ends up showing a shutdown log but it stays there indefinitely.

However, with the Liquorix 6.6.10 kernel and the XanMod x64v3 6.6.10 kernel this problem does not occur (the computer power off almost instantly), so it does not seem to be a problem with the kernel base (or logically, it would also happen in them), and I have verified that it also does not happen with Manjaro kernel version 6.1.71 LTS, so everything seems to indicate that the latest versions of the Manjaro kernel are being compiled with some type of patch that does not work correctly on this type of computers.

You can find more technical information about this error as well as the logs in a message I posted on the forum last week:

https://forum.manjaro.org/t/impossible-to-restart-or-shut-down-the-machine-never-completed/154094/15

Maybe compile our kernel without the patches to see if it works for you and point out the patch which breaks your system.

Sorry for the question, but is there a link that explains how to do that? I have always installed the kernel binaries that came with the distribution, I have never compiled any of it.

My knowledge is only enough to be able to launch the automatic compilation from pacma or pacman when installing the package, but I don’t know how to choose the patches to install to try to determine which one is causing the problem… So I would be very grateful if you could give me some link on how to perform this operation.

If we check the patches we will see these differences between linux61 and linux66:

-source=("https://git.kernel.org/torvalds/t/linux-${_basekernel}.tar.gz"
+source=(https://git.kernel.org/torvalds/t/linux-${_basekernel}.tar.gz
         "https://www.kernel.org/pub/linux/kernel/v6.x/patch-${pkgver}.xz"
-        'config'
+        config
+        # Upstream Patches
         # ARCH Patches
-        '0101-ZEN_Add_sysctl_and_CONFIG_to_disallow_unprivileged_CLONE_NEWUSER.patch'
-        '0102-Revert-drmi915-improve_the_catch-all_evict_to_handle_lock_contention.patch'
-        '0103-drmi915-improve_the_catch-all_evict_to_handle_lock_contention.patch'
+        0101-ZEN_Add_sysctl_and_CONFIG_to_disallow_unprivileged_CLONE_NEWUSER.patch
+        0102-drivers-firmware-skip-simpledrm-if-nvidia-drm.modese.patch
         # MANJARO Patches
-        '0999-patch_realtek.patch'
-        # Bootsplash
-        '0301-revert-fbcon-remove-now-unusued-softback_lines-cursor-argument.patch'
-        '0302-revert-fbcon-remove-no-op-fbcon_set_origin.patch'
-        '0303-revert-fbcon-remove-soft-scrollback-code.patch'
-        '0401-bootsplash.patch'
-        '0402-bootsplash.patch'
-        '0403-bootsplash.patch'
-        '0404-bootsplash.patch'
-        '0405-bootsplash.patch'
-        '0406-bootsplash.patch'
-        '0407-bootsplash.patch'
-        '0408-bootsplash.patch'
-        '0409-bootsplash.patch'
-        '0410-bootsplash.patch'
-        '0411-bootsplash.patch'
-        '0412-bootsplash.patch'
-        '0413-bootsplash.gitpatch'
-        # ACS_override patch
-        '0999-acs.gitpatch'
+        # Realtek patch
+        0999-patch_realtek.patch
+        # ROG ALLY Patches
+        v14.7-0001-HID-asus-fix-more-n-key-report-descriptors-if-.patch
+        v14.7-0002-HID-asus-make-asus_kbd_init-generic-remove-rog.patch
+        v14.7-0003-HID-asus-add-ROG-Ally-N-Key-ID-and-keycodes.patch
+        v14.7-0004-HID-asus-add-ROG-Ally-xpad-settings.patch
+        0006-platform-x86-asus-wmi-disable-USB0-hub-on-ROG-Ally-b.patch
+        0007-mt7921e_Perform_FLR_to_recovery_the_device.patch
+        # AMD GPU reset patches
+        0301-drm-Add_GPU_reset_sysfs_event.patch
+        0302-drm-amdgpu-add_work_function_for_GPU_reset_event.patch
+        0303-drm-amdgpu-schedule_GPU_reset_event_work_function.patch
+        # No overrides ROG ally <= 323 BIOS
+        0001-ALSA-hda-cs35l41-Support-ASUS-2023-laptops-with-miss.patch
+        0001-ALSA-hda-cs35l41-Improve-support-for-ASUS-ROG-Ally.patch
+        # Additional ALLY patches
+        ROG-ALLY-NCT6775-PLATFORM.patch
+        0001-iio-imu_Add_driver_for_BMI323_IMU.patch
+        0002-iio-imu-bmi323-Make-the-local-structures-static.patch
+        0003-iio-imu_Add_ROG_ALLY_bmi323-support.patch
+        0004-iio-imu-Load_ROG_ALLY_mount_matrix.patch
+        0005-iio-imu-ASUS-ROG-ALLY-force-INT1-IRQ.patch
+        # Steamdeck HID patches
+        0001-HID.patch
 )

My bet would be these:

+        # AMD GPU reset patches
+        0301-drm-Add_GPU_reset_sysfs_event.patch
+        0302-drm-amdgpu-add_work_function_for_GPU_reset_event.patch
+        0303-drm-amdgpu-schedule_GPU_reset_event_work_function.patch

Other patches are too specific to the device ASUS ROG ALLY anyway.

You can download the kernel sources and comment those patches out in the PKGBUILD script. Then run:

updpkgsums
makepkg -si

and report back if that kernel will shutdown for you. If my guess was wrong, try to remove other additional patches and let me know after recompiling your kernel.

also, does linux67 shuts down as wanted @Alden20? That one doesn’t ship those GPU reset patches.

No, it doesn’t work in the 6.7.0rc8-1 kernel of Manjaro, it happens the same as with the kernel of Manjaro 6.6.10-1. I’ll have to wait in a few weeks for Liquorix and XanMod to update their kernels to the final version 6.7 that just came out today to see what happens in them, but as I said, in their versions 6.6.10 the computer shutdown does work.

Well, the results are:

  • I’ve tried compiling the 6.6.10 kernel without any patch (I’ve commented on all the patch lines), but it keeps crashing, the system doesn’t shut down. :x:

  • Used “set amdgpu.mcbp=0” in the boot process, but it doesn’t work, it continues crashing. :x:

  • I have also installed the Linux kernel Zen 6.6.10 (binaries) and this one also crashes like the ones in Manjaro. (Does this kernel have anything to do with Arch? I say this because of how curious it also failed.) :x:

  • I just installed the Manjaro kernel 6.7.0-0 that has just been released and the same thing keeps happening as with 6.6.10, the system does not shut down or restart. :x:

  • I also just installed the XanMod Edge x64v3 6.7.0 (binaries) which has also just been released, and it works perfectly with the new 6.7 kernel. :white_check_mark:

  • With the Liquorix 6.6.10 and XanMod x64v3 6.6.10 kernels (binaries) it works perfectly. :white_check_mark:

I don’t know what it could be then, but it’s clear that there’s something in the Arch or Manjaro kernels that’s not working properly.

So, any ideas?

1 Like

Anyway, I don’t know where the error could be, but last week I opened a bug related to the kernel:

I think that on our side we can’t do much more, in the meantime I’ll continue working with the Liquorix and XanMod kernels, let’s see if with a bit of luck this problem is corrected in kernel 6.8 with the big changes that there will be in the AMDGPU section.

1 Like

Can you try Linux-tkg if it fixes this issue? → Select version 6.6.10 and “native-amd” , then build it.

I just compiled it with the options you told me (native compilation for AMD), but the same error continues to occur with that kernel…

I just installed the latest version of the Manjaro kernel available from the 6.5 branch (6.5.13-7) and the same thing happens in this version as well, it hangs when trying to reboot or shutdown.

I think that the vanilla Kernels have some bug with your specific mainboard.

In my experience, some Kernels didn’t know about my mainboard having a new sensor, but they forced my computer to shut down immediately after waking up from sleep.

1 Like

This morning I formatted the hard drive and took the opportunity to do a test before reinstalling Manjaro. I installed Fedora, Kubuntu and KDE Neon. Fedora used the 6.6 kernel, Kubuntu the 6.5.14 and I think KDE Neon also, or 6.5.13.

In all three, the problem that I mentioned did not appear with restarting the computer, I mention it in case this can give some other idea of the cause of the problem…

My computer is having the same boot & shutdown problems as well, occurring after the recent Manjaro LTS linux66 update from 6.6.8-2 to 6.6.10-1. Linux61 LTS 6.1.71-1 doesn’t encounter this problem.

My mini-PC is an AMD Ryzen 9 6900HX with Radeon 680M iGPU. When the system boots I see a large swath of white text appear for a few seconds that it moves passed, or sometimes hangs, and the shutdown screen hangs with some slow moving white text.

I also run Arcolinux and the same boot/shutdown problems occur after the recent Arch linux-lts update from 6.1.71-1 to 6.6.11-1. The current 6.6.10arch1-1 kernel also gives the same problem. To keep both systems operable I am holding them at LTS 6.1.71-1.

EDIT: These are images showing the shutdown screens for both Manjaro and Arcolinux after kernel updates.

I just installed version 6.8rc1 that is available in Manjaro Settings (I imagine it will be a build from the linux-next branch since version 6.8rc1 has not been published yet), and the same problem continues to occur when restarting the computer.

But it is very strange that it does not occur in the XanMod or Liquorix kernel, nor in the Fedora, KDE Neon or Kubuntu kernels. If it were a base error in the kernel, it should happen in all of them.

And if it were something related to Plymouth, the other Linux distros that I have tried may not use it or have it active, but I understand that it is active when I use the XanMod or Liquorix kernel in Manjaro and with these kernels it does not happen… I don’t know, it’s very strange.

Based on the previous message, from chairman67, and in the tests I did compiling the kernels from the AUR repository as well as the tests I did in the other distros, could it be possible that only the kernels of the Arch Linux distro (and therefore its derivatives) present this problem?

Alden20, yes I believe this issue comes from the Arch Linux kernels based that I first encountered it when the Arch LTS kernel updated to 6.6.x in Arco Linux.

Does doing a fresh install of Manjaro resolve this issue for you, with Manjaro’s linux66 kernel?

I’m thinking about moving away from Arch based distros because of this. Switching Manjaro KDE to Fedora KDE… and Arco Linux Cinnamon to Mint. It’s a bit unfortunate but I need stability in my case.

Why don’t you diff working kernel’s $(zcat /proc/config.gz) with a non-working one and see if something stands out?

Chairman67, but the solution is easy, you can do what I have done, install the XanMod kernel from the AUR repository or Liquorix from its own repository since this problem does not occur in both kernels.

For Liquorix it is super simple, you just have to run this line in the terminal:

curl -s 'https://liquorix.net/install-liquorix.sh' | sudo bash

And it installs itself, and then you just have to select it in the Grub menu when you boot the system, if it is not the one that loads by default (if it is not selected by default, you can change it with an application like grub-editor in the AUR repository, which allows you to change Grub boot parameters with a graphical interface).

1 Like

But a question about that file. Does this change depending on the kernel I have selected to boot?

That is, I understand that what I will have to do is when loading a kernel, capture the content of that file, then load the other, capture it, and then compare them.