Random shutdown hang on Manjaro Architect

Hi community!
I’m writing this post to be of some help, since this took me months to solve this issue, because the issue was random and happening only at shutdown and sometimes it seemed solved but it wasn’t.

I installed Manjaro Architect, a very basic installation, since I use it to run only a mac os virtual machine with qemu.

Issue was a Manjaro hang with shutdown (shutdown -h now).
This was only random, reboot command was always working well.
Nothing strange reported in the logs!
Sometimes, on shutdown -h now, the system hanged at:
reboot: Power down

or with a black screen with an underscore (with quiet in grub conf).

The only solution was long press the mechanical power button.

This is my actual system (dual boot with windows 10):

Riepilogo

System:

Kernel: 5.4.80-2-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0

Console: tty 1 Distro: Manjaro Linux

Machine:

Type: Server Mobo: ASUSTeK model: Z9PE-D8 WS v: 1.0x serial:

UEFI: American Megatrends v: 5802 date: 06/10/2015

CPU:

Info: 2x 8-Core model: 06/2d bits: 64 type: MCP SMP arch: Sandy Bridge

rev: 5 L2 cache: 40.0 MiB

flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 99763

Speed: 1204 MHz min/max: 1200/3800 MHz Core speeds (MHz): 1: 1207 2: 1205

3: 1204 4: 1204 5: 1204 6: 1204 7: 1204 8: 1204 9: 1204 10: 1204 11: 1204

12: 1260 13: 1214 14: 1204 15: 1204 16: 1205

Graphics:

Device-1: NVIDIA GF108GL [Quadro 600] driver: nouveau v: kernel

bus ID: 03:00.0

Device-2: NVIDIA GK110B [GeForce GTX TITAN Black] driver: vfio-pci v: 0.2

bus ID: 83:00.0

Display: server: No display server data found. Headless machine?

tty: 80x24

Message: Unable to show advanced data. Required tool glxinfo missing.

Audio:

Device-1: Intel C600/X79 series High Definition Audio vendor: ASUSTeK

driver: vfio-pci v: 0.2 bus ID: 00:1b.0

Device-2: NVIDIA GF108 High Definition Audio driver: snd_hda_intel

v: kernel bus ID: 03:00.1

Device-3: NVIDIA GK110 High Definition Audio driver: vfio-pci v: 0.2

bus ID: 83:00.1

Sound Server: ALSA v: k5.4.80-2-MANJARO

Network:

Device-1: Intel 82574L Gigabit Network vendor: ASUSTeK driver: e1000e

v: 3.2.6-k port: 8000 bus ID: 06:00.0

IF: enp6s0 state: down mac:

Device-2: Intel 82574L Gigabit Network vendor: ASUSTeK driver: e1000e

v: 3.2.6-k port: 7000 bus ID: 07:00.0

IF: enp7s0 state: up speed: 100 Mbps duplex: full mac:

IF-ID-1: br0 state: up speed: N/A duplex: N/A mac:

IF-ID-2: br1 state: up speed: N/A duplex: N/A mac:

IF-ID-3: vnet0 state: unknown speed: 10 Mbps duplex: full mac:

IF-ID-4: vnet1 state: unknown speed: 10 Mbps duplex: full mac:

Drives:

Local Storage: total: 8.19 TiB used: 3.36 TiB (41.0%)

ID-1: /dev/sda vendor: SanDisk model: SDSSDP256G size: 238.47 GiB

ID-2: /dev/sdb vendor: Crucial model: CT500MX500SSD1 size: 465.76 GiB

ID-3: /dev/sdc vendor: Western Digital model: WD20EZRX-00DC0B0

size: 1.82 TiB

ID-4: /dev/sdd vendor: Western Digital model: WD60EZRX-00MVLB1

size: 5.46 TiB

ID-5: /dev/sde vendor: Hitachi model: HTS542525K9SA00 size: 232.89 GiB

Partition:

ID-1: / size: 227.74 GiB used: 3.50 GiB (1.5%) fs: ext4 dev: /dev/sde2

Swap:

Alert: No Swap data was found.

Sensors:

System Temperatures: cpu: 29.0 C mobo: N/A gpu: nouveau temp: 48.0 C

Fan Speeds (RPM): N/A gpu: nouveau fan: 2820

Info:

Processes: 292 Uptime: 7m Memory: 62.92 GiB used: 32.76 GiB (52.1%)

Init: systemd Compilers: gcc: N/A Packages: 422 Shell: Bash v: 5.0.18

inxi: 3.1.08

These were all my attempts to solve the issue:

  1. installed another gpu for host with free drivers and 2nd gpu bounded to vfio (at first I had only the gtx titan black bounded to nouveau for the host, then to vfio when passed through and again to nouveau on vm shutdown)
  2. installed another gpu for host with nvidia drivers (390xx) and 2nd gpu bounded to vfio (blacklist nouveau)
  3. blacklist ipmi (not recognized)
  4. blacklist webcam (error related to gspca_vc032x)
  5. blacklist wifi (rt2x00 and rt2800)
  6. added shutdown hook
  7. checked ehci is off in bios
  8. deleted quiet in grub conf
  9. added acpi=off in grub conf
  10. added reboot=bios in grub conf
  11. added reboot=pci in grub conf
  12. added intel_idle.max_cstate=1 in grub conf
  13. added acpi_osi=! acpi_osi=‘Windows 2018’ in grub conf
  14. added acpi_osi=! acpi_osi=‘Windows 2009’ in grub conf
  15. stopped, disabled and masked lvm2-lvmetad and lvm2-monitor
  16. added mei_me to the RUNTIME_PM_DRIVER_BLACKLIST configuration of TLP
  17. set networks to down before shutdown (2x ethernet)
  18. stopped smb/nmb before shutdown
  19. tried different kernels (from 5.4 LTS to 5.9)
  20. installed uefi manjaro (it was installed in legacy mode, and I had to disable CSM in bios to make manjaro installing as uefi)
  21. disabled secure boot
  22. Disabled XHCI in the bios (EHCI was already disabled)
  23. Disabled ASMEDIA USB 3.0 controller in bios

None of these solved the issue.

This was working for me:

adding nouveau.vram_pushbuf=1 apm=power_off acpi=force in grub conf

If you have the same issue it could worth a try.

I was wrong…again…very frustrating, it seems it works, and then…it hangs again :frowning:
I hope I found the culprit now, it’s 2 days now that the system properly shutdowns.
It seems related to systemd-coredump.
Disabling it seems to solve the issue: I’m not going to say “disabling it solves the issue”, now I’m going to wait for some weeks before saying that :smiley:
I opened an issue on github, let’s see how it goes…

Update: this workaround seems to have fixed my random hangs at shutdown: I did about 20 successful shutdown now, and it never happened before without a hang.

Before finding the workaround I also tried without success:

  1. Added Before=basic.target in dbus.service
  2. WOL disabled in bios
  3. intel_pstate=disable (use acpi_cpufreq)
  4. intel_pstate=disable + blacklist acpi_cpufreq (no cpu drivers)
  5. acpi=force reboot=acpi
  6. reboot=acpi
  7. apm=power-off
  8. noapic nolapic (can’t boot)
  9. video-vesa drivers (+blacklist nouveau)
  10. acpi_osi=’!Windows2012’
  11. watchdog disabled and spectre/meltdown security patches disabled
  12. swapped the quadro 600 gpu with a GeForce 8400 GS
  13. swapped the mainboard with another asus z9ped8-ws with non engineering cpus (2x QB7R e5-2687w quality samples on the original one, 2x SR0KG e5-2687w on the second one)
  14. Tried different distro including ubuntu 20.04 server (a debian based), artix (a systemd free)

My last attempt was to look at the ACPI tables: luckily I know that they exist, that they can have bugs, that they can be fixed and that they can be injected at boot.
I know that method _PTS is responsible for S states, S5 included.
My first attempt was to dump my DSDT and fix only all the errors (mostly some wrong lengths, invalid objects and other minor things): compiled and injected, it didn’t work, always randomly hangs on shutdown.

Then, I modified the code of the _PTS method.
Original code:

Method (_PTS, 1, NotSerialized)  // _PTS: Prepare To Sleep
{
    If (((Arg0 == 0x04) && (OSFL () == 0x02)))
    {
        Sleep (0x0BB8)
    }

    PTS (Arg0)
    DBG8 = Arg0
    WAKP [Zero] = Zero
    WAKP [One] = Zero
    WSSB = ASSB /* \ASSB */
    WOTB = AOTB /* \AOTB */
    WAXB = AAXB /* \AAXB */
    ASSB = Arg0
    AOTB = OSFL ()
    AAXB = Zero
    \_SB.SLPS = One
}

Modified code:

Method (_PTS, 1, NotSerialized)
{
   If (LEqual (Arg0, 0x05)) {}
   Else
   {
       Store (Arg0, DBG8)
       If (LAnd (LEqual (Arg0, 0x04), LEqual (OSFL (), 0x02)))
       {
           Sleep (0x0BB8)
       }

       PTS (Arg0)
       Store (Zero, Index (WAKP, Zero))
       Store (Zero, Index (WAKP, One))
       Store (ASSB, WSSB)
       Store (AOTB, WOTB)
       Store (AAXB, WAXB)
       Store (Arg0, ASSB)
       Store (OSFL (), AOTB)
       Store (Zero, AAXB)
       Store (One, \_SB.SLPS)
   }
}

I really know near nothing about code in the DSDT, the “fixed” code is found on internet, to fix sleep/restart on hackintosh machines for a different mainboard.
I noticed that most of the variables were identical to mine and I noticed a new if cycle containing Arg0=0x05, which complains the S5 state.
Compiled and injected, no more issues!

I really don’t know why this could randomly cause hangs at shutdown, but it would be good if the kernel could be modified to work without injecting a custom DSDT: I know it’s a firmware bug, and I don’t know if it can be fixed kernel side, maybe some guru can give more light on this.
I noticed a lot of new discussions around about hangs on shutdown, without solutions, maybe DSDT is the workaround to go in these situations.

I also opened an issue at the kernel bugtracker (fixed DSDT source for asus Z9PED8-WS can be found there):
https://bugzilla.kernel.org/show_bug.cgi?id=210689

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.