No display on screen attached to second GPU - missing navi24 support in linux-firmware?

I’ve since some days the same issue/symptoms with a different configuration. My Dell Optiplex 3070 has a built in Coffee-Lake GPU and in addition I use since years a second GPU from AMD. My inxi data are:

System:
  Kernel: 6.17.1-0-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 15.2.1
    clocksource: tsc avail: acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.17-x86_64
    root=UUID=7bc67a3c-9b33-479c-b74e-5fb4ef1e532a rw quiet apparmor=1
    security=apparmor resume=UUID=7f827f88-5e1a-44c5-a730-fd3837cd104f
    udev.log_priority=3
  Desktop: Xfce v: 4.20.1 tk: Gtk v: 3.24.48 wm: xfwm4 v: 4.20.0
    with: xfce4-panel tools: light-locker vt: 7 dm: LightDM v: 1.32.0
    Distro: Manjaro base: Arch Linux
Machine:
  Type: Desktop System: Dell product: OptiPlex 3070 v: N/A
    serial: <superuser required> Chassis: type: 3 serial: <superuser required>
  Mobo: Dell model: 07WP95 v: A00 serial: <superuser required> part-nu: 0930
    uuid: <superuser required> UEFI: Dell v: 1.26.0 date: 02/29/2024
Battery:
  Message: No system battery data found. Is one present?
Memory:
  System RAM: total: 16 GiB available: 15.43 GiB used: 2.15 GiB (13.9%)
  Message: For most reliable report, use superuser + dmidecode.
  Array-1: capacity: 32 GiB slots: 2 modules: 2 EC: None
    max-module-size: 16 GiB note: est.
  Device-1: DIMM1 type: DDR4 detail: synchronous size: 8 GiB speed:
    spec: 2666 MT/s actual: 2400 MT/s volts: curr: 1 width (bits): data: 64
    total: 64 manufacturer: 04CD000080CE part-no: F4-2666C19-8GNT serial: N/A
  Device-2: DIMM2 type: DDR4 detail: synchronous size: 8 GiB speed:
    spec: 2666 MT/s actual: 2400 MT/s volts: curr: 1 width (bits): data: 64
    total: 64 manufacturer: 04CD000080CE part-no: F4-2666C19-8GNT serial: N/A
PCI Slots:
  Permissions: Unable to run dmidecode. Root privileges required.
CPU:
  Info: model: Intel Core i3-9100 bits: 64 type: MCP arch: Coffee Lake
    gen: core 9 level: v3 note: check built: 2018 process: Intel 14nm family: 6
    model-id: 0x9E (158) stepping: 0xB (11) microcode: 0xF6
  Topology: cpus: 1x dies: 1 clusters: 4 cores: 4 smt: <unsupported> cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
    L3: 6 MiB desc: 1x6 MiB
  Speed (MHz): avg: 800 min/max: 800/4200 scaling: driver: intel_pstate
    governor: powersave cores: 1: 800 2: 800 3: 800 4: 800 bogomips: 28800
  Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat
    arch_capabilities arch_perfmon art avx avx2 bmi1 bmi2 bts clflush
    clflushopt cmov constant_tsc cpuid cpuid_fault cx16 cx8 de ds_cpl dtes64
    dtherm dts epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu
    fsgsbase fxsr ht hwp hwp_act_window hwp_epp hwp_notify ibpb ibrs ida
    intel_pt invpcid lahf_lm lm mca mce md_clear mmx monitor movbe mpx msr
    mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge
    pln pni popcnt pse pse36 pti pts rdrand rdseed rdtscp rep_good sdbg sep
    smap smep ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2
    tpr_shadow tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic
    xgetbv1 xsave xsavec xsaveopt xsaves xtopology xtpr
  Vulnerabilities:
  Type: gather_data_sampling mitigation: Microcode
  Type: ghostwrite status: Not affected
  Type: indirect_target_selection status: Not affected
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT
    disabled
  Type: mds mitigation: Clear CPU buffers; SMT disabled
  Type: meltdown mitigation: PTI
  Type: mmio_stale_data mitigation: Clear CPU buffers; SMT disabled
  Type: old_microcode status: Not affected
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed mitigation: IBRS
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: IBRS; IBPB: conditional; STIBP: disabled;
    RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected
  Type: srbds mitigation: Microcode
  Type: tsa status: Not affected
  Type: tsx_async_abort status: Not affected
  Type: vmscape mitigation: IBPB before exit to userspace
Graphics:
  Device-1: Intel CoffeeLake-S GT2 [UHD Graphics 630] vendor: Dell
    driver: i915 v: kernel arch: Gen-9.5 process: Intel 14nm built: 2016-20
    ports: active: DP-1 empty: DP-2, HDMI-A-1, HDMI-A-2, HDMI-A-3
    bus-ID: 00:02.0 chip-ID: 8086:3e91 class-ID: 0300
  Device-2: Advanced Micro Devices [AMD/ATI] Navi 24 [Radeon RX 6400/6500
    XT/6500M] vendor: Sapphire driver: N/A alternate: amdgpu arch: RDNA-2
    code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4
    speed: 16 GT/s lanes: 16 bus-ID: 03:00.0 chip-ID: 1002:743f class-ID: 0300
  Display: x11 server: X.Org v: 21.1.18 compositor: xfwm4 v: 4.20.0 driver:
    X: loaded: modesetting alternate: fbdev,vesa dri: iris gpu: i915
    display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 2560x1440 s-dpi: 96 s-size: 678x382mm (26.69x15.04")
    s-diag: 778mm (30.64")
  Monitor-1: DP-1 model: Dell U2520D serial: <filter> built: 2020 res:
    mode: 2560x1440 hz: 60 scale: 100% (1) dpi: 118 gamma: 1.2 chroma: red:
    x: 0.686 y: 0.310 green: x: 0.271 y: 0.663 blue: x: 0.149 y: 0.059 white:
    x: 0.314 y: 0.329 size: 553x311mm (21.77x12.24") diag: 634mm (25")
    ratio: 16:9 modes: 2560x1440, 2048x1280, 1920x1200, 2048x1080, 1920x1080,
    1920x1080i, 1600x1200, 1600x900, 1280x1024, 1152x864, 1280x720, 1024x768,
    800x600, 720x576, 720x480, 640x480, 720x400
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
    device: 1 drv: swrast gbm: drv: iris surfaceless: drv: iris x11: drv: iris
    inactive: wayland
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 25.2.3-arch1.2
    glx-v: 1.4 direct-render: yes renderer: Mesa Intel UHD Graphics 630 (CFL
    GT2) device-ID: 8086:3e91 memory: 15.06 GiB unified: yes
  Info: Tools: api: eglinfo,glxinfo de: xfce4-display-settings
    x11: xdpyinfo, xprop, xrandr
Audio:
  Device-1: Intel Cannon Lake PCH cAVS vendor: Dell driver: snd_hda_intel
    v: kernel alternate: snd_soc_avs,snd_sof_pci_intel_cnl bus-ID: 00:1f.3
    chip-ID: 8086:a348 class-ID: 0403
  Device-2: Advanced Micro Devices [AMD/ATI] Navi 21/23 HDMI/DP Audio
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 03:00.1 chip-ID: 1002:ab28 class-ID: 0403
  API: ALSA v: k6.17.1-0-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: PipeWire v: 1.4.8 status: active with: 1: pipewire-media-session
    status: active 2: pw-jack type: plugin tools: pw-cat,pw-cli
  Server-2: PulseAudio v: 17.0-43-g3e2bb status: active with:
    1: pulseaudio-alsa type: plugin 2: pulseaudio-jack type: module
    tools: pacat,pactl,pavucontrol
Network:
  Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
    vendor: Dell driver: r8168 v: 8.055.00 modules: r8169 pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 port: 3000 bus-ID: 04:00.0 chip-ID: 10ec:8168
    class-ID: 0200
  IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IP v4: <filter> type: dynamic noprefixroute scope: global
    broadcast: <filter>
  IP v6: <filter> type: dynamic noprefixroute scope: global
  IP v6: <filter> type: dynamic noprefixroute scope: global
  IP v6: <filter> type: noprefixroute scope: link
  Info: services: NetworkManager,systemd-timesyncd
  WAN IP: <filter>
Bluetooth:
  Message: No bluetooth data found.
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 1.36 TiB used: 818.33 GiB (58.6%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 PRO 500GB
    size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 2B2QGXA7 temp: 29.9 C
    scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 860 EVO 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: 3B6Q scheme: GPT
  Optical-1: /dev/sr0 vendor: PLDS model: DVD+-RW DU-8A5LH rev: 6D1M
    dev-links: cdrom
  Features: speed: 24 multisession: yes audio: yes dvd: yes
    rw: cd-r,cd-rw,dvd-r state: running
Partition:
  ID-1: / raw-size: 456.66 GiB size: 448.43 GiB (98.20%)
    used: 181.14 GiB (40.4%) fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:2
    label: N/A uuid: 7bc67a3c-9b33-479c-b74e-5fb4ef1e532a
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 39.9 MiB (13.3%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
    label: NO_LABEL uuid: D433-1BC5
  ID-3: /mnt/Bigdata raw-size: 931.51 GiB size: 915.82 GiB (98.32%)
    used: 637.16 GiB (69.6%) fs: ext4 dev: /dev/sda1 maj-min: 8:1
    label: Samsung860Evo uuid: 0ca23a1c-af43-4411-984f-6579e9bb8065
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
    compressor: zstd max-pool: 20%
  ID-1: swap-1 type: partition size: 8.8 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/nvme0n1p3 maj-min: 259:3 label: N/A
    uuid: 7f827f88-5e1a-44c5-a730-fd3837cd104f
Unmounted:
  Message: No unmounted partitions found.
USB:
  Hub-1: 1-0:1 info: hi-speed hub with single TT ports: 16 rev: 2.0
    speed: 480 Mb/s (57.2 MiB/s) lanes: 1 mode: 2.0 chip-ID: 1d6b:0002
    class-ID: 0900
  Hub-2: 1-1:2 info: Texas Instruments ports: 6 rev: 2.1
    speed: 480 Mb/s (57.2 MiB/s) lanes: 1 mode: 2.0 chip-ID: 0451:8442
    class-ID: 0900
  Device-1: 1-1.5:4 info: Texas Instruments type: HID
    driver: hid-generic,usbhid interfaces: 1 rev: 2.0
    speed: 480 Mb/s (57.2 MiB/s) lanes: 1 mode: 2.0 chip-ID: 0451:82ff
    class-ID: 0300
  Device-2: 1-1.6:5 info: Texas Instruments type: billboard driver: N/A
    interfaces: 1 rev: 2.0 speed: 480 Mb/s (57.2 MiB/s) lanes: 1 mode: 2.0
    chip-ID: 0451:82ee class-ID: 1100
  Device-3: 1-3:3 info: Dell KB216 Wired Keyboard type: keyboard,HID
    driver: hid-generic,usbhid interfaces: 2 rev: 1.1
    speed: 1.5 Mb/s (183 KiB/s) lanes: 1 mode: 1.0 power: 100mA
    chip-ID: 413c:2113 class-ID: 0300
  Device-4: 1-5:6 info: Dell Laser Mouse MS3220 type: mouse,HID
    driver: hid-generic,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s (1.4 MiB/s)
    lanes: 1 mode: 1.1 power: 100mA chip-ID: 413c:250e class-ID: 0300
  Hub-3: 2-0:1 info: super-speed hub ports: 8 rev: 3.1
    speed: 10 Gb/s (1.16 GiB/s) lanes: 1 mode: 3.2 gen-2x1 chip-ID: 1d6b:0003
    class-ID: 0900
  Hub-4: 2-3:2 info: Texas Instruments ports: 4 rev: 3.2
    speed: 5 Gb/s (596.0 MiB/s) lanes: 1 mode: 3.2 gen-1x1 chip-ID: 0451:8440
    class-ID: 0900
Sensors:
  System Temperatures: cpu: 42.0 C pch: 42.0 C mobo: N/A
  Fan Speeds (rpm): N/A
Repos:
  Packages: 2344 pm: pacman pkgs: 2330 libs: 542 tools: pamac pm: flatpak
    pkgs: 0 pm: snap pkgs: 14
  Active pacman repo servers in: /etc/pacman.d/mirrorlist
    1: https://mirror.alpix.eu/manjaro/stable/$repo/$arch
Processes:
  CPU top: 5 of 244
  1: cpu: 9.6% command: firefox pid: 2263 mem: 314.2 MiB (1.9%)
  2: cpu: 8.4% command: firefox pid: 1810 mem: 421.2 MiB (2.6%)
  3: cpu: 1.8% command: Xorg pid: 1216 mem: 87.6 MiB (0.5%)
  4: cpu: 0.7% command: xfce4-terminal pid: 2506 mem: 58.5 MiB (0.3%)
  5: cpu: 0.6% command: xfwm4 pid: 1713 mem: 65.3 MiB (0.4%)
  Memory top: 5 of 244
  1: mem: 421.2 MiB (2.6%) command: firefox pid: 1810 cpu: 8.4%
  2: mem: 314.2 MiB (1.9%) command: firefox pid: 2263 cpu: 9.6%
  3: mem: 139.4 MiB (0.8%) command: firefox pid: 2139 cpu: 0.2%
  4: mem: 117.8 MiB (0.7%) command: xfdesktop pid: 1747 cpu: 0.1%
  5: mem: 107.0 MiB (0.6%) command: firefox pid: 2189 cpu: 0.1%
Info:
  Processes: 244 Power: uptime: 7m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 6.15 GiB services: upowerd,xfce4-power-manager
    Init: systemd v: 257 default: graphical tool: systemctl
  Compilers: clang: 20.1.8 gcc: 15.2.1 Shell: Bash v: 5.3.3
    running-in: xfce4-terminal inxi: 3.3.39

My display had been successfully run in the past connected to display port of graphics device 2 (Advanced Micro Devices [AMD/ATI] Navi 24 [Radeon RX 6400/6500XT/6500M]), but now I get only black screen. GRUB menu as well as UEFI is shown no matter, which display port I use (device 1 or 2). Now on device 2 no TTY or greeter is shown. Device 1 display port works as expected.

lightDM settings have not been changed and the GPU detection flag is set.

[LightDM]
logind-check-graphical=true

I’m not aware of any “special” settings I made in the last years, so everything is quite vanilla. I’m on stable branch, system had been set-up four or more years ago.

Any idea how to get my second GPU working again? Please, forgive me, if I ask simple questions, but I’ve found after searching and reading for hours no real solution. I worked through the Arch Wiki but found no solution, either. Any hint is welcome.

PS: Running kernel 6.16 does not change the behavior.

Update: Tried out now GDM as alternative to lightDM. Unfortunately, this does not improve the situation. GDM is also not showing up on secondary GPU (device 2) display port. So, it seems to be something else and not the greeter…

Update 2: Used ISO manjaro-xfce-25.0.10-251013-linux612.iso to boot from stick. I can then use both display ports (device 1 coffee-lake, device 2 navi 24) without any black screens. Kernel 6.12 with stable branch updates installed as of today (15.10.2025) fails. Iso with same kernel 6.12 works. Thus, must be something in the various configs or the drivers. Which configs shall I check? Which logs can give insides?

I’ve moved your post to a dedicated topic as it had little in common with the topic you originally posted in. The DE is different (Xfce vs Cinnamon), and the GPU is different (Radeon vs Nvidia).

Please note that “hijacking” another member’s topic with your own issue is generally frowned upon: Forum Rules: Thread Hijacking.

Hopefully other members with expertise in Xfce and graphics will now be able to assist you in resolving your issue.

2 Likes

@scotty65 Thanks for moving the topic. Different Forums different rules. Sorry, for hijacking the other topic. I thought it had enough in common. But, I totally agree, that it should be a thread of its own.

What I also observed now, is that after GRUB boot menu the Manjaro boot logo is not present also with build in GPU, before the Greeter pops up. Booting with ISO as mentioned above, show for both GPUs at start-up the Manjaro logo after GRUB. I boot in UEFI w/o secure boot.

Still I have no clue how to improve the situation. Is downgrading MESA an option? I re-installed MESA but also with now success.

Would a fresh installation help? But that would only reset settings, isn’t it. If yes, then I should be able to achieve this also with current installation.

Any help is welcomed.

UPDATE:
Running journalctl -b | grep amdgpu reveals:

Oct 20 21:37:40 optiplex3070 kernel: Modules linked in: amdgpu(+) i915 amdxcp drm_ttm_helper drm_exec drm_panel_backlight_quirks nvme intel_gtt gpu_sched i2c_algo_bit nvme_core drm_suballoc_helper ttm drm_buddy nvme_keyring sr_mod cdrom nvme_auth spi_intel_pci drm_display_helper spi_intel video cec wmi
Oct 20 21:37:40 optiplex3070 kernel: RIP: 0010:amdgpu_irq_put+0xa8/0xc0 [amdgpu]Oct 20 21:37:40 optiplex3070 kernel:  amdgpu_fence_driver_hw_fini+0xf9/0x130 [amdgpu ec6195c0324c74888513b04a241d8ebe901d12d2]
Oct 20 21:37:40 optiplex3070 kernel:  amdgpu_device_fini_hw+0xb7/0x2e8 [amdgpu ec6195c0324c74888513b04a241d8ebe901d12d2]
Oct 20 21:37:40 optiplex3070 kernel:  amdgpu_driver_load_kms.cold+0x19/0x2f [amdgpu ec6195c0324c74888513b04a241d8ebe901d12d2]
Oct 20 21:37:40 optiplex3070 kernel:  amdgpu_pci_probe+0x1e6/0x4d0 [amdgpu ec6195c0324c74888513b04a241d8ebe901d12d2]
Oct 20 21:37:40 optiplex3070 kernel:  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu ec6195c0324c74888513b04a241d8ebe901d12d2]
Oct 20 21:37:40 optiplex3070 kernel:  ? amdgpu_init+0x42/0xff0 [amdgpu ec6195c0324c74888513b04a241d8ebe901d12d2]
Oct 20 21:37:40 optiplex3070 kernel: WARNING: CPU: 0 PID: 153 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xa8/0xc0 [amdgpu]
  • According to web search this shall hint to:
    Problems loading necessary firmware blobs for the GPU (especially with recent cards like the RX 6400/6500 XT).
  • Kernel version or firmware package (linux-firmware) mismatch.
  • Conflicts between multiple GPUs or drivers.
  • Recent updates to the kernel, mesa, or firmware packages introduced incompatibilities.

I checked usr/lib/firmware/amdgpu/ and there is no navi24 entry, which means, navi24 blob is missing?!

Can somebody confirm this issue? Is current linux-firmware missing navi24 support?

navi24 support seems to be covered by other blobs (navi10, navi12, navi14), but no details are found.

However, amdgpu drivers are not loaded correctly on my system (anymore).

I see also right now this error of journalctl -b | grep amdgpu:

Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I’m not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to enable requested dpm features!
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: hw_init of IP block  failed -62
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
Oct 20 21:37:40 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
Oct 20 21:37:40 optiplex3070 kernel: WARNING: CPU: 0 PID: 153 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xa8/0xc0 [amdgpu]

Does this mean current linux-firmware, linked to all installed kernels, causes the error?

Found a link to an upstream similar issue: [SOLVED] Driver for AMDGPU doesn't work anymore after a random time / Newbie Corner / Arch Linux Forums
Solution there is to take LTS kernel. 6.12 LTS I already tried with now success, 6.6 LTS was not tested. However, if all kernels on my system are now “linked” to latest linux-firmware (is that non-sense what I write??) and the firmware contains the incompatibility, I do not expect, that it is working better.

As feared, 6.6 LTS does not work either (anymore). Now I’ve no clue how to overcome the issue…

Please comment, if anyone has a clue, what else to do or check. Any advise is highly appreciated.

Downgrading mesa from 25.2.x to 25.1.x does also bring no difference.

@jkkr

Please paste all code or command output as pre-formatted text – using three (3) backtick characters – on their own line, both above and below your pasted text.

Example:

```
Paste your code or command output as text here
```

This will no doubt be simpler than your efforts so far and will conform to forum guidelines.

Thank you for your co-operation.

Regards.

What follows is from a standard template.


Welcome to the Manjaro community

As a new or infrequent forum user, please take some time to familiarise yourself with Forum requirements, and the many ways to use the forum to your benefit.

To that end, links are provided (below) - Please use them.


Be prepared to provide outputs from various commands when asked. It’s equally important to provide as much actionable information as possible in your first post, rather than simply indicating there is a problem.

Waiting for others to blindly ask questions can be counter-productive – typically, nobody has a :crystal_ball: at their disposal – Instead, please help others to make informed suggestions, based on information you provide.


Update Announcements

The Update Announcements contain important information and a Known Issues and Solutions section that should generally be checked before posting a request for support.

System Information

While information from *-fetch type apps might be fine for someone wishing to buy your computer, for Support purposes it’s better to ask your system directly; :eyes:

Output of the inxi command (with appropriate parameters, and formatted according to forum guidelines) will generate information useful for those wishing to help:

inxi --filter --verbosity=8

or the short form (preferred):

inxi -zv8
Highly Recommended
Technical Resources
Required Reading

1 Like

I’m still searching for root causes and I made a little bit of progress. I found Booting with AMD GPU (RX5500 XT) not working, where hw_init of IP block failed -62 was mentioned and amdgpu was not loading correctly. So, I added amdgpu.dpm=0 to the kernel command line, which - if I understood correctly - turns off power dynamic power management at boot.

Running journalctl -b | grep amdgpu after reboot changed now.

Oct 21 21:03:05 optiplex3070 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.17-x86_64 root=UUID=7bc67a3c-9b33-479c-b74e-5fb4ef1e532a rw quiet splash amdgpu.dpm=0 apparmor=1 security=apparmor resume=UUID=7f827f88-5e1a-44c5-a730-fd3837cd104f udev.log_priority=3
Oct 21 21:03:05 optiplex3070 kernel: [drm] amdgpu kernel modesetting enabled.
Oct 21 21:03:05 optiplex3070 kernel: amdgpu: Virtual CRAT table created for CPU
Oct 21 21:03:05 optiplex3070 kernel: amdgpu: Topology: Add CPU node
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: enabling device (0106 -> 0107)
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: initializing kernel modesetting (BEIGE_GOBY 0x1002:0x743F 0x1DA2:0xE458 0xC7).
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: register mmio base: 0xE2100000
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: register mmio size: 1048576
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 0 <nv_common>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 1 <gmc_v10_0>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 2 <navi10_ih>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 3 <psp>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 4 <smu>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 5 <dm>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 6 <gfx_v10_0>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 7 <sdma_v5_2>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 8 <vcn_v3_0>
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
Oct 21 21:03:05 optiplex3070 kernel: amdgpu: ATOM BIOS: 113-D63401-US4
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: VRAM: 4080M 0x0000008000000000 - 0x00000080FEFFFFFF (4080M used)
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu: 4080M of VRAM memory ready
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu: 7897M of GTT memory ready.
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: STB initialized to 2048 entries
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x02020021
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Found VCN firmware Version ENC: 1.33 DEC: 4 VEP: 0 Revision: 13
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0xa00000 from 0x80fd000000 for PSP TMR
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
Oct 21 21:03:05 optiplex3070 kernel: amdgpu: smu firmware loading failed
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
Oct 21 21:03:05 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.

The failure changed now to amdgpu: smu firmware loading failed. Now I’m stuck again. Can this be a hint about kernel mismatch to linux-firmware version?

Do you have linux-firmware-amdgpu installed?

Thanks Aragorn for the hint.

Yes, I double checked and also triggered a re-installation:
linux-firmware-amdgpu-20250917-1 is up to date – reinstalling

I did a re-boot and the problem still persists, unfortunately.

I tried out to install latest upstream arch version:

sudo pacman -U https://www.archlinux.de/download/core/os/x86_64/linux-firmware-amdgpu-20251011-1-any.pkg.tar.zst

After installation and reboot the error is still there. At least in combination with 6.17.

Heureka!!

Oct 21 22:47:06 optiplex3070 kernel: [drm] amdgpu kernel modesetting enabled.
Oct 21 22:47:06 optiplex3070 kernel: amdgpu: Virtual CRAT table created for CPU
Oct 21 22:47:06 optiplex3070 kernel: amdgpu: Topology: Add CPU node
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: enabling device (0106 -> 0107)
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 0 <nv_common>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 1 <gmc_v10_0>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 2 <navi10_ih>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 3 <psp>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 4 <smu>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 5 <dm>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 6 <gfx_v10_0>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 7 <sdma_v5_2>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 8 <vcn_v3_0>
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
Oct 21 22:47:06 optiplex3070 kernel: amdgpu: ATOM BIOS: 113-D63401-US4
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: VRAM: 4080M 0x0000008000000000 - 0x00000080FEFFFFFF (4080M used)
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Oct 21 22:47:06 optiplex3070 kernel: [drm] amdgpu: 4080M of VRAM memory ready
Oct 21 22:47:06 optiplex3070 kernel: [drm] amdgpu: 7898M of GTT memory ready.
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: STB initialized to 2048 entries
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x02020020
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: Found VCN firmware Version ENC: 1.33 DEC: 4 VEP: 0 Revision: 8
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0xa00000 from 0x80fd000000 for PSP TMR
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000d, smu fw if version = 0x00000010, smu fw program = 0, version = 0x00492400 (73.36.0)
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Display Core v3.2.334 initialized on DCN 3.0.3
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] DP-HDMI FRL PCON supported
Oct 21 22:47:06 optiplex3070 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x02020020
Oct 21 22:47:06 optiplex3070 kernel: amdgpu: HMM registered 4080MB device memory
Oct 21 22:47:06 optiplex3070 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Oct 21 22:47:06 optiplex3070 kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Oct 21 22:47:06 optiplex3070 kernel: amdgpu: Virtual CRAT table created for GPU
Oct 21 22:47:06 optiplex3070 kernel: amdgpu: Topology: Add dGPU node [0x743f:0x1002]
Oct 21 22:47:06 optiplex3070 kernel: kfd kfd: amdgpu: added device 1002:743f

I finally made it. A bunch of measures were necessary to achieve it:

  1. downgrade to mesa 25.1.9 (sudo downgrade mesa)
  2. downgrade linux-firmware-amdgpu to 20250808-2 (sudo downgrade linux-firmware-amdgpu
  3. removing kernel command `amdgpu.dpm=0` again
  4. Starting with kernel 6.16

Maybe it is a lucky finding and not the only working combination. However, I will test carefully now some time.

Should the issues reported upstream or is it a unique problem for me?

Joy was too early. The issue is there again. After a cold reboot, same error is there. So, maybe my secondary GPU has a hardware defect? Replacement will be not easy, because it is SFF single slot GPU…
Or is it only a timing issue?

@jkkr I would guess this is a linux-firmware regression. kernel and firmware bisect would be the way to go. This way we could bring this issue upstream.

Back in the days I did a git-bisect on 4.1 kernel. Took me a while to figure it out: Commits · hphilm/linux41 · GitHub

Hi philm, thanks for looking into the topic. I was some days off and had seen your answer only today.

Just installed most recent versions of linux-firmaware-amdgpu-20251021 from arch as well mesa-1:25.2.6-1-x86_64 and mesa-utils as of today. Problem persists with 6.17.1. Secondary GPU is only randomly present in 1 of 20 restarts.

I’m using GNU/Linux quite a while now, but was not yet so deeply digging into machinery room. I basically understood after some reading the idea of kernel + firmware bisection. I’ll try my luck.

Thanks for supporting here.