Need help analyzing AMGPU logs

Been having a bunch of problems running Diablo 4 for the past few days, I know that there’s an existing issue with the game even on Windows but it has gotten far worse on Manjaro the past few weeks. It would either do one of the following: Freeze the game, sounds still on, other display is accessible and functioning, or it would completely reboot my whole PC.

Ran journalctl -b -0 -k to try and see what it was logging and these are the closest logs to the time Diablo 4 froze up:

May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:484)
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:  Process Diablo IV.exe pid 15527 thread vkd3d_queue pid 15671
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:   in page starting at address 0x0000800126c6c000 from client 10
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501430
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:          Faulty UTCL2 client ID: SQC (data) (0xa)
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:          MORE_FAULTS: 0x0
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:          WALKER_ERROR: 0x0
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:          PERMISSION_FAULTS: 0x3
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:          MAPPING_ERROR: 0x0
May 10 22:36:43 monstar kernel: amdgpu 0000:03:00.0:          RW: 0x0
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: Dumping IP State
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: Dumping IP State Completed
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: [drm] AMDGPU device coredump file has been created
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 timeout, signaled seq=3675950, emitted seq=3675952
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0:  Process Diablo IV.exe pid 15527 thread vkd3d_queue pid 15671
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: Starting gfx_0.0.0 ring reset
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: Ring gfx_0.0.0 reset succeeded
May 10 22:36:44 monstar kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset

Here’s my system’s info:

System:
  Kernel: 7.0.3-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 15.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-7.0-x86_64
    root=UUID=1f2d0a33-1c88-4678-8ee3-e2282523750c rw quiet splash
    udev.log_priority=3
  Desktop: KDE Plasma v: 6.6.4 tk: Qt v: N/A wm: kwin_x11 with: krunner
    dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
  Type: Desktop Mobo: Gigabyte model: B850I AORUS PRO v: x.x serial: N/A
    uuid: 03ff0210-04e0-05b4-9e06-690700080009 Firmware: UEFI vendor: American
    Megatrends LLC. v: FA1 date: 02/06/2025
Battery:
  Message: No system battery data found. Is one present?
Memory:
  System RAM: total: 32 GiB available: 30.46 GiB used: 7.01 GiB (23.0%)
  Array-1: capacity: 128 GiB slots: 4 modules: 2 EC: None
    max-module-size: 32 GiB note: est.
  Device-1: Channel-A DIMM 0 type: no module installed
  Device-2: Channel-A DIMM 1 type: DDR5 detail: synchronous unbuffered
    (unregistered) size: 16 GiB speed: 6000 MT/s volts: curr: 1.1 min: 1.1
    max: 1.1 width (bits): data: 64 total: 64 manufacturer: G.SKILL
    part-no: F5-6000J3038F16G serial: <filter>
  Device-3: Channel-B DIMM 0 type: no module installed
  Device-4: Channel-B DIMM 1 type: DDR5 detail: synchronous unbuffered
    (unregistered) size: 16 GiB speed: 6000 MT/s volts: curr: 1.1 min: 1.1
    max: 1.1 width (bits): data: 64 total: 64 manufacturer: G.SKILL
    part-no: F5-6000J3038F16G serial: <filter>
PCI Slots:
  Slot: 1 type: PCIe gen: 1 status: in use length: short volts: 3.3
    bus-ID: 00:01.1 children: 1: 01:00.0 class-ID: 0604 type: bridge children:
    1: 02:00.0 class-ID: 0604 type: bridge children: 1: 03:00.0 class-ID: 0300
    type: display 2: 03:00.1 class-ID: 0403 type: audio
  Slot: 2 type: N/A status: in use info: M.2, J3502 length: short volts: 3.3
    bus-ID: 00:01.2 children: 1: 04:00.0 class-ID: 0108 type: mass-storage
  Slot: 3 type: PCIe gen: 3 status: available length: short volts: 3.3
    bus-ID: 00:1f.7
CPU:
  Info: model: AMD Ryzen 7 7800X3D socket: AM5 bits: 64 type: MT MCP
    arch: Zen 4 gen: 4 level: v4 note: check built: 2022+ process: TSMC n5 (5nm)
    family: 0x19 (25) model-id: 0x61 (97) stepping: 2 microcode: 0xA60120A
  Topology: cpus: 1x dies: 1 clusters: 1 cores: 8 threads: 16 tpc: 2
    smt: enabled cache: L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 8 MiB
    desc: 8x1024 KiB L3: 96 MiB desc: 1x96 MiB
  Speed (MHz): avg: 2983 min/max: 426/5053 boost: enabled
    base/boost: 4200/5050 scaling: driver: amd-pstate-epp governor: performance
    volts: 1.3 V ext-clock: 100 MHz cores: 1: 2983 2: 2983 3: 2983 4: 2983
    5: 2983 6: 2983 7: 2983 8: 2983 9: 2983 10: 2983 11: 2983 12: 2983
    13: 2983 14: 2983 15: 2983 16: 2983 bogomips: 134140
  Flags: 3dnowprefetch abm adx aes amd_lbr_pmc_freeze amd_lbr_v2 aperfmperf
    apic arat avic avx avx2 avx512_bf16 avx512_bitalg avx512_vbmi2
    avx512_vnni avx512_vpopcntdq avx512bw avx512cd avx512dq avx512f
    avx512ifma avx512vbmi avx512vl bmi1 bmi2 bpext cat_l3 cdp_l3 clflush
    clflushopt clwb clzero cmov cmp_legacy constant_tsc cpb cppc cpuid
    cpuid_fault cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc
    cr8_legacy cx16 cx8 de decodeassists erms extapic extd_apicid f16c
    flush_l1d flushbyasid fma fpu fsgsbase fsrm fxsr fxsr_opt gfni ht
    hw_pstate ibpb ibrs ibrs_enhanced ibs invpcid irperf lahf_lm lbrv lm mba
    mca mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx nonstop_tsc
    nopl npt nrip_save nx ospke osvw overflow_recov pae pat pausefilter
    pclmulqdq pdpe1gb perfctr_core perfctr_llc perfctr_nb perfmon_v2
    pfthreshold pge pku pni popcnt pse pse36 rapl rdpid rdpru rdrand rdseed
    rdt_a rdtscp rep_good sep sha_ni skinit smap smca smep ssbd sse sse2
    sse4_1 sse4_2 sse4a ssse3 stibp succor svm svm_lock syscall tce topoext
    tsc tsc_scale umip user_shstk v_spec_ctrl vaes vgif vmcb_clean vme
    vmmcall vnmi vpclmulqdq wbnoinvd wdt x2avic xgetbv1 xsave xsavec
    xsaveerptr xsaveopt xsaves xtopology
  Vulnerabilities:
  Type: gather_data_sampling status: Not affected
  Type: ghostwrite status: Not affected
  Type: indirect_target_selection status: Not affected
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: old_microcode status: Not affected
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow mitigation: Safe RET
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Enhanced / Automatic IBRS; IBPB:
    conditional; STIBP: always-on; PBRSB-eIBRS: Not affected; BHI: Not
    affected
  Type: srbds status: Not affected
  Type: tsa mitigation: Clear CPU buffers
  Type: tsx_async_abort status: Not affected
  Type: vmscape mitigation: IBPB before exit to userspace
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Navi 32 [Radeon RX 7700 XT /
    7800 XT] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-3
    code: Navi-3x process: TSMC n5 (5nm) built: 2022+ pcie: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: DP-2,HDMI-A-2
    empty: DP-1,HDMI-A-1,Writeback-1 bus-ID: 03:00.0 chip-ID: 1002:747e
    class-ID: 0300
  Device-2: Advanced Micro Devices [AMD/ATI] Raphael vendor: Gigabyte
    driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm)
    built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: none
    empty: Writeback-2 bus-ID: 12:00.0 chip-ID: 1002:164e class-ID: 0300
    temp: 52.0 C
  Display: unspecified server: X.Org v: 21.1.22 with: Xwayland v: 24.1.11
    compositor: kwin_x11 driver: X: loaded: amdgpu unloaded: modesetting,radeon
    alternate: fbdev,vesa dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
  Screen-1: 0 s-res: 4480x1440 s-dpi: 96 s-size: 1185x381mm (46.65x15.00")
    s-diag: 1245mm (49.01")
  Monitor-1: DP-2 mapped: DisplayPort-1 pos: primary,left
    model: Acer VG271U M3 serial: <filter> built: 2023 res: mode: 2560x1440
    hz: 180 scale: 100% (1) dpi: 109 gamma: 1.2 chroma: red: x: 0.675 y: 0.314
    green: x: 0.275 y: 0.671 blue: x: 0.149 y: 0.047 white: x: 0.314 y: 0.329
    size: 597x336mm (23.5x13.23") diag: 685mm (27") ratio: 16:9
    modes: 2560x1440, 1920x1080, 1280x1440, 1280x1024, 1280x720, 1024x768,
    832x624, 800x600, 720x576, 720x480, 640x480, 720x400
  EDID-Warnings: 1: parse_edid: unknown tag 112
  Monitor-2: HDMI-A-2 mapped: HDMI-A-1 pos: right model: Acer VG240Y S
    serial: <filter> built: 2020 res: mode: 1920x1080 hz: 144 scale: 100% (1)
    dpi: 93 gamma: 1.2 chroma: red: x: 0.659 y: 0.333 green: x: 0.298 y: 0.608
    blue: x: 0.145 y: 0.059 white: x: 0.314 y: 0.329
    size: 527x296mm (20.75x11.65") diag: 604mm (23.8") ratio: 16:9
    modes: 1920x1080, 3840x2160, 1680x1050, 1280x1024, 1440x900, 1280x960,
    960x1080, 1280x800, 1152x864, 1280x720, 1024x768, 832x624, 800x600,
    720x576, 720x480, 640x480, 720x400
  EDID-Warnings: 1: parse_edid: unhandled CEA mode 96 2: parse_edid:
    unhandled CEA mode 97
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: radeonsi device: 2 drv: swrast gbm: drv: radeonsi
    surfaceless: drv: radeonsi x11: drv: radeonsi inactive: wayland
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 26.0.6-arch1.1
    glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 7800 XT (radeonsi
    navi32 ACO DRM 3.64 7.0.3-1-MANJARO) device-ID: 1002:747e
    memory: 15.62 GiB unified: no
  API: Vulkan v: 1.4.341 layers: 4 device: 0 type: discrete-gpu name: AMD
    Radeon RX 7800 XT (RADV NAVI32) driver: mesa radv v: 26.0.6-arch1.1
    device-ID: 1002:747e surfaces: N/A device: 1 type: integrated-gpu name: AMD
    Ryzen 7 7800X3D 8-Core Processor (RADV RAPHAEL_MENDOCINO)
    driver: mesa radv v: 26.0.6-arch1.1 device-ID: 1002:164e surfaces: N/A
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor wl: wayland-info
    x11: xdpyinfo, xprop, xrandr
Audio:
  Device-1: Advanced Micro Devices [AMD/ATI] Navi 31 HDMI/DP Audio
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 03:00.1 chip-ID: 1002:ab30 class-ID: 0403
  Device-2: Advanced Micro Devices [AMD/ATI] Radeon High Definition Audio
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 12:00.1 chip-ID: 1002:1640 class-ID: 0403
  Device-3: Advanced Micro Devices [AMD] Ryzen HD Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 12:00.6 chip-ID: 1022:15e3 class-ID: 0403
  Device-4: Giga-Byte USB Audio driver: hid-generic,snd-usb-audio,usbhid
    type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-2:2
    chip-ID: 0414:a014 class-ID: 0300
  API: ALSA v: k7.0.3-1-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: JACK v: 1.9.22 status: off tools: N/A
  Server-3: PipeWire v: 1.6.4 status: n/a (root, process) with:
    1: pipewire-pulse status: active 2: wireplumber status: active
    3: pipewire-alsa type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8125 2.5GbE vendor: Gigabyte driver: r8169 v: kernel
    pcie: gen: 2 speed: 5 GT/s lanes: 1 port: d000 bus-ID: 09:00.0
    chip-ID: 10ec:8125 class-ID: 0200
  IF: enp9s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IP v4: <filter> type: dynamic noprefixroute scope: global
    broadcast: <filter>
  IP v6: <filter> type: noprefixroute scope: link
  Device-2: Realtek RTL8922AE 802.11be PCIe Wireless Network Adapter
    driver: rtw89_8922ae v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1
    port: c000 bus-ID: 0a:00.0 chip-ID: 10ec:8922 class-ID: 0280
  IF: wlp10s0 state: down mac: <filter>
  Info: services: NetworkManager,systemd-timesyncd
  WAN IP: <filter>
Bluetooth:
  Device-1: Realtek Bluetooth Radio driver: btusb v: 0.8 type: USB rev: 1.0
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-12:4 chip-ID: 0bda:8922
    class-ID: e001 serial: <filter>
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 3.89 TiB used: 2.12 TiB (54.6%)
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital model: WD BLACK
    SN850X HS 2000GB size: 1.82 TiB block-size: physical: 512 B logical: 512 B
    speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter> fw-rev: 620361WD
    temp: 63.9 C scheme: GPT
  SMART: yes health: PASSED on: 165d 9h cycles: 798
    read-units: 43,516,828 [22.2 TB] written-units: 38,962,453 [19.9 TB]
  ID-2: /dev/nvme1n1 maj-min: 259:4 vendor: Western Digital
    model: WD Blue SN580 1TB size: 931.51 GiB block-size: physical: 512 B
    logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 281010WD temp: 58.9 C scheme: GPT
  SMART: yes health: PASSED on: 370d 10h cycles: 994
    read-units: 39,831,506 [20.3 TB] written-units: 41,140,123 [21.0 TB]
  ID-3: /dev/sda maj-min: 8:0 vendor: Kingston model: SKC600 1024G
    size: 953.87 GiB block-size: physical: 512 B logical: 512 B sata: 3.2
    speed: 6.0 Gb/s tech: SSD serial: <filter> fw-rev: 15A2 temp: 35 C
    scheme: GPT
  SMART: yes state: enabled health: PASSED on: 211d 10h cycles: 967
    read: 212.87 TiB written: 252.14 TiB
  ID-4: /dev/sdb maj-min: 8:16 vendor: Kingston model: SNVS250G
    size: 232.89 GiB block-size: physical: 512 B logical: 512 B type: USB
    rev: 3.2 spd: 10 Gb/s lanes: 1 mode: 3.2 gen-2x1 tech: N/A
    serial: <filter> fw-rev: 1.00 drive-rev: T1103N0L temp: 40 Celsius C
    scheme: MBR
  SMART: yes health: PASSED on: 91 hrs cycles: 541
    read-units: 2,710,499 [1.38 TB] written-units: 4,537,332 [2.32 TB]
  Message: No optical or floppy data found.

I’ve validated that I’m only encountering this issue on Diablo 4 launched through steam, all other games I run on steam work perfectly fine, any ideas on what I should do or check next?

Try switching to the edge branch - not a promise - not a solution - a try it, see if it works…

sudo pacman-mirrors -aSunstable && sudo pacman -Syyu
 $ mbn info linux70 -q
Branch         : unstable
Name           : linux70
Version        : 7.0.5-1
Repository     : core
Build Date     : Fri 08 May 2026 10:12:42 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : testing
Name           : linux70
Version        : 7.0.3-1
Repository     : core
Build Date     : Thu 30 Apr 2026 19:29:37 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : stable
Name           : linux70
Version        : 7.0.3-1
Repository     : core
Build Date     : Thu 30 Apr 2026 19:29:37 
Packager       : Manjaro Build Server <build@manjaro.org>

Will give it a shot. If I wanted to go back to the stable branch, how would I go about doing that?

sudo pacman-mirrors -aSstable && sudo pacman -Syyuu
System:
  Kernel: 7.0.5-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 16.1.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-7.0-x86_64
    root=UUID=1f2d0a33-1c88-4678-8ee3-e2282523750c rw quiet splash
    udev.log_priority=3
  Desktop: KDE Plasma v: 6.6.4 tk: Qt v: N/A info: frameworks v: 6.26.0
    wm: kwin_wayland vt: 1 dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
  Type: Desktop Mobo: Gigabyte model: B850I AORUS PRO v: x.x
    serial: <superuser required> uuid: <superuser required> Firmware: UEFI
    vendor: American Megatrends LLC. v: FA1 date: 02/06/2025

Thank you, will start testing it out

So it just happened again. This time it was the everything went black, audio was still running, pc rebooted after a few minutes of that. okay now as I was typing this response, PC just completely froze, had to force reboot it again.

Edit: Was able to track some journalctl logs from prior the black screen/freeze reboot and found this:

May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.165194]    prop_minor:      1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.165194]    sysname:         card1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.165194]    syspath:         /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/drm/card1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.165194]    attr_name:       (null)
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.165217] (dw_watch_display_connections) Time since last return from sleep = 5508491627942 ns = 5508492 ms
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260886] Udev event detected
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927] Udev_Event_Detail
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_subsystem:  drm
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_action:     change
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_connector:  (null)
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_devname:    /dev/dri/card1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_devmode:    (null)
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_hotplug:    (null)
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_major:      226
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    prop_minor:      1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    sysname:         card1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    syspath:         /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/drm/card1
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260927]    attr_name:       (null)
May 11 00:51:23 monstar org_kde_powerdevil[1611]: [  1947][5514.260947] (dw_watch_display_connections) Time since last return from sleep = 5508587358425 ns = 5508587 ms
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=RESET
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: failed to reset legacy queue
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: reset via MES failed and try pipe reset -110
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: The CPFW hasn't support pipe reset yet.
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: Ring gfx_0.0.0 reset failed
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: GPU reset begin!. Source:  1
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
May 11 00:51:26 monstar kernel: amdgpu 0000:03:00.0: Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
May 11 00:51:29 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:29 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:31 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:31 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:34 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:34 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:36 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:36 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:38 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:38 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:41 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:41 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:43 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:43 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:46 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:46 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue
May 11 00:51:48 monstar kernel: amdgpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 11 00:51:48 monstar kernel: amdgpu 0000:03:00.0: failed to unmap legacy queue

Adding GPU spec and model: Sapphire Nitro+ Radeon RX 7800 XT 16GB

Any ideas?

Mod edit: Consecutive posts merged.

So you are looking at the kernel to solve a bug in a specific (Diablo 4) software?

From the information provided - I would look at disabling power management (powerdevil).

Other than that :person_shrugging:

I am running my daily work out an AMD system using AI (LM Studio and Ollama) and I have no issues - so I am inclined to believe it is the application itself - in this case a game which you already know is causing problems on Windows.

I don’t think this is hardware or kernel related.

Yes - it is logged - but that doesn’t imply it is causality.

But you were the one that suggested this!

I don’t see how you running an AMD system using AI has anything whatsoever to do with this issue.

This however is a good idea for troubleshooting as the log that is provided shows the GPU was reset which often means a game will crash. I would disable power management entirely while you test. You can mask the service to disable it entirely for now systemctl --user mask plasma-powerdevil. To enable it again systemctl --user unmask plasma-powerdevil

Additionally a utility like feral gamemode (GitHub - FeralInteractive/gamemode: Optimise Linux system performance on demand · GitHub) may disable this for you (I’m not entirely sure). Also, if you have any overclocking set on your GPU, also disable that while you’re troubleshooting.

You may also get some tips from ProtonDB https://www.protondb.com/app/2344520

You also mentioned your PC locking up/freezing and having to reboot. Did you by chance try switching to another console (tty)? i.e. When it locks up, try Ctrl+Alt+F4; if the whole system hasn’t frozen you can then login and troubleshoot the issue from a terminal.

Additionally you should check out this thread for how to attempt a clean power down rather than a hard reset.

This one? Warning: Game breaking bug affecting nVidia and AMD GPU in Windows / Linux :: Diablo® IV General Discussions if so, there is probably very little you can do about it until they fix the game - although there are some workarounds to reduce its occurrence on that thread.

I don’t see why it is necessary to comment like that… If I don’t know an outright answer - I usually engage in a conversation - a thought process…

Yes - because AMD GPU driver is with the kernel and game crashes are often caused in the GPU area which again points to updating the kernel.

Because of the hardware involved AMD which has drivers for hardware with the kernel and because intensive use of similar hardware - albeit other tasks - does not fail.

So it is a line of thoughts - reasoning, thinking, which leads to suggestions - as to where to look for possible solutions.

I will refrain from further comment on this.

2 Likes

Fair enough @linux-aarhus but it read differently. I still don’t agree on the AMD hardware thing though, that’s quite the leap given how different and capable different hardware (particularly GPUs) are with every iteration or new release. And running a game vs running AI demand very different things from a GPU and the software controlling it.

I have never looked, said, or even considered that the kernel is causing, and will resolve my issues, YOU suggested this without hesitation. I was merely asking for guidance on how I could look further into the issues I am having as at this point the developers does not seem to be willing to do anything else for this current issue. And being on Linux, with a known community to be helpful in figuring or at least trying to figure out issues and having some sort of workaround, I asked.

I also don’t see why it is necessary for you to comment this specific line after you suggested it and act like I was looking at it as the solution to my problem. I know you did not intend for it to be a fix, a solution, or anything, you just wanted me to try it out, which i did.

Thank you I appreciate this, I will try disabling power management and try to explore other options. I currently have no OC on any of my hardware and I did have a look at the porotondb page and tried out some suggestions on there with little to no success.

Yes, that seems to be the most latest thread about this issue, but afaik, it has been present since it launched. I’ve also tried the recommendations on that thread, with no success too. I think I even commented on that thread as well.

I am well aware that this issues is on Blizzard to fix and address, but I don’t see that happening anytime soon. I’m just trying to understand what’s happening and possibly find a better work around other than the dxvk.conf fix.

1 Like

So I tried disable power management using the commands you shared, and unfortunately still got the same issue. I looked at journalctl again and saw this

May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:4 pasid:586)
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:  Process Diablo IV.exe pid 46758 thread vkd3d_queue pid 46871
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800191b3f000 from client 10
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00401430
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x0
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
May 15 03:08:10 monstar kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
May 15 03:08:13 monstar wireplumber[1098]: wp-event-dispatcher: <WpAsyncEventHook:0x563b744b5c10> failed: <WpSiStandardLink:0x563b745e2f30> link failed: 1 of 1 PipeWire links failed to activate
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=8802415, emitted seq=8802417
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu:  Process Diablo IV.exe pid 46758 thread vkd3d_queue pid 46871
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded
May 15 03:08:20 monstar kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=8802416, emitted seq=8802419
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu:  Process Diablo IV.exe pid 46758 thread vkd3d_queue pid 46871
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded
May 15 03:08:31 monstar kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
May 15 03:08:31 monstar steam[45357]: radv/amdgpu: The CS has been cancelled because the context is lost. This context is guilty of a hard recovery.
May 15 03:08:31 monstar steam[45357]: radv: GPUVM fault detected at address 0x800191b3f000.
May 15 03:08:31 monstar steam[45357]: GCVM_L2_PROTECTION_FAULT_STATUS: 0x401430
May 15 03:08:31 monstar steam[45357]:          CLIENT_ID: (SQC (data)) 0xa
May 15 03:08:31 monstar steam[45357]:          MORE_FAULTS: 0
May 15 03:08:31 monstar steam[45357]:          WALKER_ERROR: 0
May 15 03:08:31 monstar steam[45357]:          PERMISSION_FAULTS: 3
May 15 03:08:31 monstar steam[45357]:          MAPPING_ERROR: 0
May 15 03:08:31 monstar steam[45357]:          RW: 0
May 15 03:08:44 monstar steam[45357]: pid 46676 != 46675, skipping destruction (fork without exec?)
May 15 03:08:45 monstar steam[45357]: Game Recording - game stopped [gameid=2344520]
May 15 03:08:45 monstar steam[45357]: Removing process 46758 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46751 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46749 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46740 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46704 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46698 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46692 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46683 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46680 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46678 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46675 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46674 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46673 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46672 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46583 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46582 for gameID 2344520
May 15 03:08:45 monstar steam[45357]: Removing process 46581 for gameID 2344520

Tried to lookup what that meant exactly, and please correct me if i’m wrong but, did the GPU driver timed out and then crashed? Also, by the looks of this log, the driver was also trying to “reset” the GPU but failed. I’ve read somewhere that this is what’s happening basically on Windows, since Diablo 4 is poorly optimised, the OS is basically just resetting the GPU to free up some VRAM, however, I was monitoring my VRAM usage this time around and I was nowhere near the limit of 16GB, if i remember correctly, it was at about ~11 or ~12, based on Steam’s built in performance overlay.

@synthaxe

Your inxi output appears to be incomplete;
please try again – inxi -zv8.