AMDGPU page fault crashes desktop

I’m experiencing this issue only when playing AC Odyssey through Steam Proton. Could also be on any other game running proton, but I have no more examples.
It starts with whole system freezing, leaving me only an option to move the cursor. Nothing is clickable and keyboard shortcuts are not working. After a second or two, both screens turn black for a second, then come back but nothing changes. The audio is working as if nothing happened when screens are not black.
I am able to switch to TTY after screens are not black again, but can’t do anything to fix the issue. The only option is hard reboot.

Here is journald output:
21.10.2022 17:17:54:251	kernel	amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32803, for process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294)
21.10.2022 17:17:54:254	kernel	amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x0000000193f24000 from client 0x1b (UTCL2)
21.10.2022 17:17:54:254	kernel	amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701431
21.10.2022 17:17:54:255	kernel	amdgpu 0000:09:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
21.10.2022 17:17:54:255	kernel	amdgpu 0000:09:00.0: amdgpu: 	 MORE_FAULTS: 0x1
21.10.2022 17:17:54:255	kernel	amdgpu 0000:09:00.0: amdgpu: 	 WALKER_ERROR: 0x0
21.10.2022 17:17:54:255	kernel	amdgpu 0000:09:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
21.10.2022 17:17:54:255	kernel	amdgpu 0000:09:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu: 	 RW: 0x0
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32803, for process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294)
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x0000000193f24000 from client 0x1b (UTCL2)
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu: 	 MORE_FAULTS: 0x0
21.10.2022 17:17:54:256	kernel	amdgpu 0000:09:00.0: amdgpu: 	 WALKER_ERROR: 0x0
21.10.2022 17:17:54:257	kernel	amdgpu 0000:09:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
21.10.2022 17:17:54:257	kernel	amdgpu 0000:09:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
21.10.2022 17:17:54:258	kernel	amdgpu 0000:09:00.0: amdgpu: 	 RW: 0x0
21.10.2022 17:17:54:258	kernel	amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32803, for process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294)
21.10.2022 17:17:54:258	kernel	amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x0000000002d4d000 from client 0x1b (UTCL2)
21.10.2022 17:17:54:259	kernel	amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
21.10.2022 17:17:54:259	kernel	amdgpu 0000:09:00.0: amdgpu: 	 Faulty UTCL2 client ID: CB/DB (0x0)
21.10.2022 17:17:54:259	kernel	amdgpu 0000:09:00.0: amdgpu: 	 MORE_FAULTS: 0x0
21.10.2022 17:17:54:259	kernel	amdgpu 0000:09:00.0: amdgpu: 	 WALKER_ERROR: 0x0
21.10.2022 17:17:54:260	kernel	amdgpu 0000:09:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
21.10.2022 17:17:54:260	kernel	amdgpu 0000:09:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
21.10.2022 17:17:54:260	kernel	amdgpu 0000:09:00.0: amdgpu: 	 RW: 0x0
21.10.2022 17:17:57:458	kernel	[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13314677, emitted seq=13314679
21.10.2022 17:17:57:458	kernel	[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ACOdyssey.exe pid 26251 thread dxvk-submit pid 26294
21.10.2022 17:18:01:458	kernel	[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
21.10.2022 17:18:01:918	kernel	amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
21.10.2022 17:18:01:918	kernel	[drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
21.10.2022 17:18:02:184	kernel	[drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
21.10.2022 17:18:02:315	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:316	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:02:317	kernel	snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
21.10.2022 17:18:03:229	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:238	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:247	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:247	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:247	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
21.10.2022 17:18:03:254	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
.
.
.

It may happen after 3 minutes of playtime, or may not happen at all. Trying to fix this I have tried this, but it didn’t help and it doesn’t seem like some specific voltage/power consumption/temp causes this.

I have also set amdgpu.runpm=0 in kernel parameters.

Full system info
System:
  Kernel: 5.19.16-2-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
    parameters: BOOT_IMAGE=/boot/vmlinuz-5.19-x86_64
    root=UUID=41600e73-c3ec-4481-bede-1e252c347923 rw quiet apparmor=1
    security=apparmor udev.log_priority=3 amdgpu.runpm=0 pcie_aspm=off
    amdgpu.gpu_recovery=1 amdgpu.lockup_timeout=3000
  Desktop: KDE Plasma v: 5.25.5 tk: Qt v: 5.15.6 wm: kwin_x11 vt: 1 dm: SDDM
    Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Desktop Mobo: ASUSTeK model: TUF B450M-PRO GAMING v: Rev X.0x
    serial: <superuser required> UEFI: American Megatrends v: 2409
    date: 12/02/2020
Battery:
  Device-1: hidpp_battery_0 model: Logitech MX Keys Wireless Keyboard
    serial: <filter> charge: 10% (should be ignored) rechargeable: yes
    status: discharging
  Device-2: hidpp_battery_1 model: Logitech Wireless Mouse MX Master 3
    serial: <filter> charge: 100% (should be ignored) rechargeable: yes
    status: discharging
Memory:
  RAM: total: 31.26 GiB used: 8.44 GiB (27.0%)
  RAM Report: permissions: Unable to run dmidecode. Root privileges
    required.
CPU:
  Info: model: AMD Ryzen 5 3600 bits: 64 type: MT MCP arch: Zen 2 gen: 3
    level: v3 note: check built: 2020-22 process: TSMC n7 (7nm)
    family: 0x17 (23) model-id: 0x71 (113) stepping: 0 microcode: 0x8701021
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
    L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 3 MiB desc: 6x512 KiB
    L3: 32 MiB desc: 2x16 MiB
  Speed (MHz): avg: 2197 high: 2201 min/max: 2200/4208 boost: enabled
    scaling: driver: acpi-cpufreq governor: ondemand cores: 1: 2195 2: 2200
    3: 2195 4: 2195 5: 2200 6: 2196 7: 2195 8: 2194 9: 2200 10: 2201 11: 2200
    12: 2200 bogomips: 86276
  Flags: 3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1
    bmi2 bpext cat_l3 cdp_l3 clflush clflushopt clwb clzero cmov cmp_legacy
    constant_tsc cpb cpuid cqm cqm_llc cqm_mbm_local cqm_mbm_total
    cqm_occup_llc cr8_legacy cx16 cx8 de decodeassists extapic extd_apicid
    f16c flushbyasid fma fpu fsgsbase fxsr fxsr_opt ht hw_pstate ibpb ibs
    irperf lahf_lm lbrv lm mba mca mce misalignsse mmx mmxext monitor movbe
    msr mtrr mwaitx nonstop_tsc nopl npt nrip_save nx osvw overflow_recov pae
    pat pausefilter pclmulqdq pdpe1gb perfctr_core perfctr_llc perfctr_nb
    pfthreshold pge pni popcnt pse pse36 rapl rdpid rdpru rdrand rdseed rdt_a
    rdtscp rep_good sep sev sev_es sha_ni skinit smap smca smep ssbd sse sse2
    sse4_1 sse4_2 sse4a ssse3 stibp succor svm svm_lock syscall tce topoext
    tsc tsc_scale umip v_spec_ctrl v_vmsave_vmload vgif vmcb_clean vme vmmcall
    wbnoinvd wdt xgetbv1 xsave xsavec xsaveerptr xsaveopt xsaves
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: retbleed mitigation: untrained return thunk; SMT enabled with STIBP
    protection
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, STIBP:
    always-on, RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Navi 23 [Radeon RX 6600/6600 XT/6600M] vendor: Micro-Star MSI
    driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm)
    built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16 ports:
    active: DP-1,HDMI-A-1 empty: DP-2,DP-3 bus-ID: 09:00.0 chip-ID: 1002:73ff
    class-ID: 0300
  Device-2: Lenovo FHD Webcam type: USB driver: snd-usb-audio,uvcvideo
    bus-ID: 1-3:2 chip-ID: 17ef:4831 class-ID: 0102 serial: <filter>
  Display: x11 server: X.Org v: 21.1.4 compositor: kwin_x11 driver: X:
    loaded: amdgpu unloaded: vesa dri: radeonsi gpu: amdgpu display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x285mm (40.00x11.22")
    s-diag: 1055mm (41.54")
  Monitor-1: DP-1 mapped: DisplayPort-0 pos: primary,left model: Dell P2319H
    serial: <filter> built: 2021 res: 1920x1080 hz: 60 dpi: 96 gamma: 1.2
    size: 509x286mm (20.04x11.26") diag: 584mm (23") ratio: 16:9 modes:
    max: 1920x1080 min: 720x400
  Monitor-2: HDMI-A-1 mapped: HDMI-A-0 pos: right model: LG (GoldStar)
    24MB35 serial: <filter> built: 2015 res: 1920x1080 hz: 60 dpi: 96
    gamma: 1.2 size: 510x290mm (20.08x11.42") diag: 587mm (23.1") ratio: 16:9
    modes: max: 1920x1080 min: 720x400
  OpenGL: renderer: AMD Radeon RX 6600 XT (dimgrey_cavefish LLVM 14.0.6 DRM
    3.47 5.19.16-2-MANJARO) v: 4.6 Mesa 22.1.7 direct render: Yes
Audio:
  Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
    bus-ID: 1-3:2 gen: 4 chip-ID: 17ef:4831 class-ID: 0102 speed: 16 GT/s
    serial: <filter> lanes: 16 bus-ID: 09:00.1 chip-ID: 1002:ab28
    class-ID: 0403
  Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 0b:00.4 chip-ID: 1022:1487 class-ID: 0403
  Device-3: Lenovo FHD Webcam type: USB driver: snd-usb-audio,uvcvideo
  Device-4: Blue Microphones Yeti Nano type: USB
    driver: hid-generic,snd-usb-audio,usbhid bus-ID: 3-2:2 chip-ID: b58e:0005
    class-ID: 0300 serial: <filter>
  Sound API: ALSA v: k5.19.16-2-MANJARO running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.58 running: yes
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: ASUSTeK PRIME B450M-A driver: r8169 v: kernel pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 port: f000 bus-ID: 05:00.0 chip-ID: 10ec:8168
    class-ID: 0200
  IF: enp5s0 state: up speed: 100 Mbps duplex: full mac: <filter>
  IP v4: <filter> type: dynamic noprefixroute scope: global
    broadcast: <filter>
  IP v6: <filter> type: noprefixroute scope: link
  IF-ID-1: br-4d194edfcefc state: down mac: <filter>
  IP v4: <filter> scope: global broadcast: <filter>
  IF-ID-2: br-933ce09a1c5c state: down mac: <filter>
  IP v4: <filter> scope: global broadcast: <filter>
  IF-ID-3: br-a580f76f80b8 state: down mac: <filter>
  IP v4: <filter> scope: global broadcast: <filter>
  IF-ID-4: br-cc892718ad0e state: down mac: <filter>
  Message: Output throttled. IPs: 1; Limit: 10; Override: --limit [1-x;-1
    all]
  IF-ID-5: br-d45c8f5c6eec state: down mac: <filter>
  Message: Output throttled. IPs: 1; Limit: 10; Override: --limit [1-x;-1
    all]
  IF-ID-6: docker0 state: down mac: <filter>
  Message: Output throttled. IPs: 1; Limit: 10; Override: --limit [1-x;-1
    all]
  WAN IP: <filter>
Bluetooth:
  Device-1: ASUSTek ASUS USB-BT500 type: USB driver: btusb v: 0.8
    bus-ID: 1-6:4 chip-ID: 0b05:190e class-ID: e001 serial: <filter>
  Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: <filter>
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 1.8 TiB used: 347.62 GiB (18.8%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 type: SSD serial: <filter> rev: 2B4QFXO7 temp: 30.9 C scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 vendor: Kingston model: SUV500480G
    size: 447.13 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 56R6 scheme: GPT
  ID-3: /dev/sdb maj-min: 8:16 vendor: Samsung model: HD502HJ
    size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 3.0 Gb/s
    type: HDD rpm: 7200 serial: <filter> rev: 0001 scheme: GPT
  Message: No optical or floppy data found.
Partition:
  ID-1: / raw-size: 436.13 GiB size: 428.21 GiB (98.18%) used: 201.87 GiB
    (47.1%) fs: ext4 dev: /dev/sda3 maj-min: 8:3 label: N/A
    uuid: 41600e73-c3ec-4481-bede-1e252c347923
  ID-2: /Steam64 raw-size: 931.51 GiB size: 915.81 GiB (98.31%) used: 145.75
    GiB (15.9%) fs: ext4 dev: /dev/nvme0n1p1 maj-min: 259:1 label: N/A
    uuid: dd0df7f1-3c83-4656-8a5c-091c0245d18f
  ID-3: /boot/efi raw-size: 1024 MiB size: 1022 MiB (99.80%) used: 288 KiB
    (0.0%) fs: vfat dev: /dev/sda1 maj-min: 8:1 label: N/A uuid: D300-C45E
Swap:
  Alert: No swap data was found.
Unmounted:
  ID-1: /dev/sda2 maj-min: 8:2 size: 10 GiB fs: ext4 label: N/A
    uuid: c8dd6dd5-9852-484b-8b39-32ecf61a9383
  ID-2: /dev/sdb1 maj-min: 8:17 size: 100 MiB fs: vfat label: N/A
    uuid: 44E8-6B2A
  ID-3: /dev/sdb2 maj-min: 8:18 size: 16 MiB fs: <superuser required>
    label: N/A uuid: N/A
  ID-4: /dev/sdb3 maj-min: 8:19 size: 249.48 GiB fs: ntfs label: WIN10
    uuid: B84EF9984EF9501C
  ID-5: /dev/sdb4 maj-min: 8:20 size: 511 MiB fs: ntfs label: N/A
    uuid: D254144554142EAB
  ID-6: /dev/sdb5 maj-min: 8:21 size: 215.66 GiB fs: <superuser required>
    label: N/A uuid: N/A
USB:
  Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 10 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 1-3:2 info: Lenovo FHD Webcam type: Video,Audio
    driver: snd-usb-audio,uvcvideo interfaces: 4 rev: 2.0 speed: 480 Mb/s
    power: 256mA chip-ID: 17ef:4831 class-ID: 0102 serial: <filter>
  Device-2: 1-5:3 info: Logitech Unifying Receiver type: Keyboard,Mouse,HID
    driver: logitech-djreceiver,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s
    power: 98mA chip-ID: 046d:c52b class-ID: 0300
  Device-3: 1-6:4 info: ASUSTek ASUS USB-BT500 type: Bluetooth driver: btusb
    interfaces: 2 rev: 1.1 speed: 12 Mb/s power: 500mA chip-ID: 0b05:190e
    class-ID: e001 serial: <filter>
  Hub-2: 2-0:1 info: Super-speed hub ports: 4 rev: 3.1 speed: 10 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
  Hub-3: 3-0:1 info: Hi-speed hub with single TT ports: 4 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 3-2:2 info: Blue Microphones Yeti Nano type: Audio,HID
    driver: hid-generic,snd-usb-audio,usbhid interfaces: 4 rev: 2.0
    speed: 12 Mb/s power: 100mA chip-ID: b58e:0005 class-ID: 0300
    serial: <filter>
  Hub-4: 4-0:1 info: Super-speed hub ports: 4 rev: 3.1 speed: 10 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
Sensors:
  System Temperatures: cpu: 35.1 C mobo: N/A gpu: amdgpu temp: 34.0 C
    mem: 32.0 C
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 1060
Info:
  Processes: 338 Uptime: 19m wakeups: 6 Init: systemd v: 251
  default: graphical tool: systemctl Compilers: gcc: 12.2.0 clang: 14.0.6
  Packages: 2059 pm: pacman pkgs: 2039 libs: 528 tools: pamac,yay pm: snap
  pkgs: 20 Shell: Zsh v: 5.9 default: Bash v: 5.1.16 running-in: konsole
  inxi: 3.3.22

does it also happens on different kernels, like the 5.15 and the 6.0?

6.0 yes, the kernels prior to 5.19 haven’t even tried because of other issues I had (not related to this one)

I also opened an issue here, trying to apply those patches, will give an update

try disable dynamic power management by adding this kernel parameter:
amdgpu.dpm=0
save grub, update grub, reboot and test

I tried it but it produced an error. I will finish my thing with my patches first and then try it again. Also, I have a fixed power setting in CoreCtrl (not sure if it’s an equivalent to a kernel option though)

corectrl can also cause similar issue, so another thing to try out is to uninstall it and remove all of its configs, reboot and test

1 Like