Cannot boot with RX 5500XT

My computer freezes with a black screen during boot. I found a potential fix in this post (/forum.manjaro.org/t/booting-with-amd-gpu-rx5500-xt-not-working/58441), but setting amdgpu.dpm=0 causes noticeably worse performance. This problem started a few weeks ago and has gradually gotten worse. At first, the computer would boot successfully after I restarted it a couple of times. When that stopped working, I booted with amdgpu.dpm=0, waited a while, and then rebooted without that setting and everything worked. Now that trick has stopped working too. I checked journalctl and found this error in the log for a failed boot:

Nov 12 20:04:52 austin-manjaro kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to enable requested dpm features!
Nov 12 20:04:52 austin-manjaro kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to setup smc hw!
Nov 12 20:04:52 austin-manjaro kernel: [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <smu> failed -62
Nov 12 20:04:52 austin-manjaro kernel: amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_ip_init failed
Nov 12 20:04:52 austin-manjaro kernel: amdgpu 0000:0b:00.0: amdgpu: Fatal error during GPU init
Nov 12 20:04:52 austin-manjaro kernel: amdgpu 0000:0b:00.0: amdgpu: amdgpu: finishing device.
Nov 12 20:04:52 austin-manjaro kernel: [drm] free PSP TMR buffer
Nov 12 20:04:52 austin-manjaro kernel: amdgpu: probe of 0000:0b:00.0 failed with error -62

It’s very similar to the error in the post I linked, but that solution causes performance issues and breaks the GPU temperature sensor. I am currently using kernel version 5.10 and I have tried using version 5.14, but I could not boot at all with 5.14, even if I set amdgpu.dpm=0. Does anyone know of a way to fix this without setting amdgpu.dpm=0 or a way to fix the performance issues I’m seeing when I use that option?

Hardware Info:
CPU: R5 5600X
GPU: RX 5500XT
Motherboard: MSI B550-A Pro
RAM: 16GM DDR4-3600
Monitor: 1080p 75FPS

PS: I bought this graphics card on ebay and I have no idea what the previous owner may have done to it.

can you provide version kernel ?

Have a MSI RX5500 8GB on AASUS B550 Board with Ryzen 7 3700X → working flawlessly.
Kernels: 5.10 / 5.14 / 5.15
Bootparamters:
GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_enforce_resources=lax libata.force=noncq apparmor=1 security=apparmor udev.log_priority=3"

From the Update Announcements:

For AMD GPU users having a black screen with kernel 5.10
Please click the â–ş at the beginning of this line to view more info

Due to a bug in the AMD drivers, please try the following first:

For GRUB:

    Open a terminal or a TTY
    Open /etc/default/grub in your favourite CLI editor (nano vi, emacs`)
    Find the line: GRUB_CMDLINE_LINUX_DEFAULT="
    Add amdgpu.dc=0
    Save
    Execute sudo update-grub and reboot

For systemd-boot:

    Open a terminal or a TTY
    Open /boot/loader/entries/manjarolinux5.10.conf in your favourite CLI editor (nano vi, emacs`)
    Add amdgpu.dc=0 to the end of the line options
    Save & reboot

For rEFInd:

    Open a terminal or a TTY
    Open /boot/refind_linux.conf in your favourite CLI editor (nano vi, emacs`)
    Find the line: "Boot using default options" "root=
    Add amdgpu.dc=0
    Save & reboot

GaVenga, I tried using your parameters with kernel 5.10 and 5.14, but they didn’t change anything.

jrichard326, I tried using amdgpu.dc=0, but it did not fix the issue. The computer still fails to boot and there is some corruption on the screen. If I use both amdgpu.dc=0 and amdgpu.dpm=0, the computer still will not boot and there is much more corruption on the screen.

I also tried booting Linux Mint 20.2 off of a flash drive, and that worked fine. I was worried that this was a hardware issue, but it looks like that is not the case.

The Kernel module “amdgpu” is failing, see this guide to make sure the module is loading properly… GDM does not start until I switch tty, - #4 by DeLinuxCo

I added amdgpu to /etc/mkinitcpio and ran mkinitcpio -P but that didn’t help. I also tried Crtl+Alt+F2, but that didn’t do anything. However, I can connect via SSH while the computer is stuck at the black screen. I ran dmesg while connected via SSH and it didn’t tell me anything new, I just found the same error message that was in journalctl.

It gets added to grub as the announcement indicates, not mkinitcpio.conf

Right, I did add amdgpu.dc=0 to grub. I was talking about DeLinuxCo’s suggestion, which says to put MODULES=(amdgpu) in /etc/mkinitcpio. I’ve done done both of these and tried GaVenga’s grub parameters too, but nothing besides amdgpu.dpm=0 has worked.

You wasn’t supposed to try random parameters that mean nothing to amdgpu (GaVenga’s options).
Try another cable. DP instead of HDMI or vice versa.
Another wild guess: try another kernel and try amdgpu-pro driver.

1 Like

https://bbs.archlinux.org/viewtopic.php?id=262013

Seems openminded is right: Try another cable?
(DP works for me (best to try) / HDMI too)
(Try: Adding “iommu=soft” as a kernel parameter)

I added the “iommu=soft” kernel parameter, swapped my HDMI cable for a DP cable, and installed the amdgpu-pro drivers, but still no luck. I did a little investigation with my Linux Mint USB drive to see what made it work when Manjaro didn’t, and I discovered that it wasn’t even using the amdgpu driver, it was using the generic vesa driver instead. I checked journalctl and it looks like Linux Mint encountered the exact same amdgpu errors as Manjaro, then switched to vesa as a fallback. Since this happens with two completely different distros and has gradually worsened for no clear reason, I’m back to suspecting a hardware failure. Guess that’s what I get for buying a graphics card on ebay. At least I have an old Radeon 260X I can use until I can get a new graphics card.

Since you referred to my post, the issue still persists for me - as you also described - when using any kernel 5.11 or above. I did not find any solution other then staying on 5.10 with amdgpu=0. At this point I very much assume a kernel regression or hardware failure.

This still kind of annoys me, because I mainly bought this GPU (second hand) to properly use wayland/sway. So I guess i’m living with it until GPU prices come down again. Since 5.10 gets maintained until 2026 there’s at least 4 more years time for this to happen :sweat_smile:

My “MSI RX5500 8GB” works without any colonel-parameter
using linux510 / linux515 /linux516 - free drivers…