After a recent reboot I suddenly can’t get into my desktop. After some digging it seems the amdgpu driver fails to load. I did run a package update (didn’t look too closely at what was updating sadly…) just before, so that probably has something to do with it.
Steps I have taken so far:
Checked that the firmwares it’s trying to load is actually on disk in /usr/lib/firmware/amdgpu/
Rebuilt initramfs (manually first, but happens automatically on kernel / firmware reinstalls as well)
Reinstalled linux-firmware
Downgraded linux-firmware to an earlier version
Downgraded kernel (6.9 → 6.8 → 6.6)
Still no luck loading the firmware. Any suggestions on how to troubleshoot this?
I was a bit worried my GPU had somehow failed, given the sudden appearance of the issue… So I downloaded the latest manjaro USB install ISO and booted on that, and installed the vulkan-radeon package. Things like vulkaninfo and vkcube works fine, so the gpu itself seems to be fine.
I could reinstall ofc, but I would really prefer to get to the bottom of the issue as I’ve spend too much time installing / configuring everything as I want it
Sorry, but not sure what that means. BIOS as in motherboard? Or GPU bios? Is there a way to update the gpu bios from the command line, or what do I need to do?
The BIOS is firmware for your motherboard.
It affects a lot of things and you should do it.
Looking at the releases available they include ryzen updates, security fixes, performance improvements, graphics card support, and more.
You will have to consult the manufacturers documentation for how to update the BIOS.
From your system here I still am interested in mhwd;
I know there are newer versions, but also that a lot of people have had stability issues on later versions. My GPU has worked fine for over a year on the current bios, so it’s weird that it would stop loading the firmware all of a sudden… But yes, at the moment it’s the only thing I have left to test.
Installed PCI configs:
--------------------------------------------------------------------------------
NAME VERSION FREEDRIVER TYPE
--------------------------------------------------------------------------------
video-linux 2024.05.06 true PCI
Warning: No installed USB configs!
local/amd-ucode 20240510.b9d2bf23-1
Microcode update image for AMD CPUs
local/lact 0.5.4-2
AMDGPU Controller application
local/lib32-vulkan-radeon 1:24.0.6-1
Open-source Vulkan driver for AMD GPUs - 32-bit
local/libteam 1.32-1
Library for controlling team network device
local/mhwd-amdgpu 19.1.0-1
MHWD module-ids for amdgpu
local/vulkan-radeon 1:24.0.6-1
Open-source Vulkan driver for AMD GPUs
local/xf86-video-amdgpu 23.0.0-2 (xorg-drivers)
X.org amdgpu video driver
Not being up to date is because I downgraded the linux-firmware package after it failed to boot (as mentioned in the original post) - it stopped working a few days ago, just around the 0510 release, so I guessed that was the issue. Downgrading didn’t help though, so I guess I might just as well install the latest version again.
Delete the packet cache and installing the latest linux-firmware again. I confirmed the files it is trying to load are available in /lib/firmware/amdgpu/
Full system update
Still no dice, same message about failing to load the firmware.
If I boot from the installer USB, the drivers load fine, so it still looks like some configuration / packet issue . Here is the log from booting from USB, if that helps:
May 19 20:01:21 manjaro kernel: [drm] amdgpu kernel modesetting enabled.
May 19 20:01:21 manjaro kernel: amdgpu: Virtual CRAT table created for CPU
May 19 20:01:21 manjaro kernel: amdgpu: Topology: Add CPU node
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
May 19 20:01:21 manjaro kernel: amdgpu: ATOM BIOS: 113-D7020100-102
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: vgaarb: deactivate vga console
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: MEM ECC is not presented.
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: VRAM: 24560M 0x0000008000000000 - 0x00000085FEFFFFFF (24560M used)
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
May 19 20:01:21 manjaro kernel: [drm] amdgpu: 24560M of VRAM memory ready
May 19 20:01:21 manjaro kernel: [drm] amdgpu: 15905M of GTT memory ready.
May 19 20:01:21 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x85fc000000 for PSP TMR
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x0000003f, smu fw program = 0, smu
fw version = 0x004e7900 (78.121.0)
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
May 19 20:01:22 manjaro kernel: snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
May 19 20:01:22 manjaro kernel: amdgpu: HMM registered 24560MB device memory
May 19 20:01:22 manjaro kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
May 19 20:01:22 manjaro kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
May 19 20:01:22 manjaro kernel: amdgpu: Virtual CRAT table created for GPU
May 19 20:01:22 manjaro kernel: amdgpu: Topology: Add dGPU node [0x744c:0x1002]
May 19 20:01:22 manjaro kernel: kfd kfd: amdgpu: added device 1002:744c
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: SE 6, SH per SE 2, CU per SH 8, active_cu_number 96
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
May 19 20:01:22 manjaro kernel: amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
May 19 20:01:22 manjaro kernel: [drm] Initialized amdgpu 3.57.0 20150101 for 0000:03:00.0 on minor 1
May 19 20:01:22 manjaro kernel: fbcon: amdgpudrmfb (fb0) is primary device
May 19 20:01:23 manjaro kernel: amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run. Advanced users may wish to specify all system modules
# in this array. For instance:
# MODULES=(piix ide_disk reiserfs)
MODULES=()
# BINARIES
# This setting includes any additional binaries a given user may
# wish into the CPIO image. This is run last, so it may be used to
# override the actual binaries included by a given hook
# BINARIES are dependency parsed, so you may safely ignore libraries
BINARIES=()
# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way. This is useful for config files.
FILES=()
# HOOKS
# This is the most important setting in this file. The HOOKS control the
# modules and scripts added to the image, and what happens at boot time.
# Order is important, and it is recommended that you do not change the
# order in which HOOKS are added. Run 'mkinitcpio -H <hook name>' for
# help on a given hook.
# 'base' is _required_ unless you know precisely what you are doing.
# 'udev' is _required_ in order to automatically load modules
# 'filesystems' is _required_ unless you specify your fs modules in MODULES
# Examples:
## This setup specifies all modules in the MODULES setting above.
## No raid, lvm2, or encrypted root is needed.
# HOOKS=(base)
#
## This setup will autodetect all modules for your system and should
## work as a sane default
# HOOKS=(base udev autodetect block filesystems)
#
## This setup will generate a 'full' image which supports most systems.
## No autodetection is done.
# HOOKS=(base udev block filesystems)
#
## This setup assembles a pata mdadm array with an encrypted root FS.
## Note: See 'mkinitcpio -H mdadm' for more information on raid devices.
# HOOKS=(base udev block mdadm encrypt filesystems)
#
## This setup loads an lvm2 volume group on a usb device.
# HOOKS=(base udev block lvm2 filesystems)
#
## NOTE: If you have /usr on a separate partition, you MUST include the
# usr, fsck and shutdown hooks.
HOOKS=(base udev autodetect modconf kms block keyboard keymap consolefont filesystems fsck)
# COMPRESSION
# Use this to compress the initramfs image. By default, gzip compression
# is used. Use 'cat' to create an uncompressed image.
#COMPRESSION="gzip"
#COMPRESSION="bzip2"
#COMPRESSION="lzma"
#COMPRESSION="xz"
#COMPRESSION="lzop"
#COMPRESSION="lz4"
#COMPRESSION="zstd"
# COMPRESSION_OPTIONS
# Additional options for the compressor
#COMPRESSION_OPTIONS=()
# MODULES_DECOMPRESS
# Decompress kernel modules during initramfs creation.
# Enable to speedup boot process, disable to save RAM
# during early userspace. Switch (yes/no).
#MODULES_DECOMPRESS="yes"
I have tried with and without MODULES=(amdgpu), but that doesn’t seem to make any difference.
Maybe something on user level has renamed it with the *.old suffix? Of course the driver will look for the file which has not an *.old suffix and therefore say error -2:
$ LANG=C errno 2
ENOENT 2 No such file or directory
Probably rename it again…? Who knows what happens?
Where is mkinitcpio finding those .old files? Get rid of them or move them somewhere mkinitcpio won’t see them and then re-run sudo mkinitcpio -P.
My guess is that it’s something to do with the way mkinitcpio matches firmware filenames and ignores extensions but I’m not going to read through the source code to confirm that