While playing a fairly large range of games on steam at intervals from anywhere from 20mins to 3/4 hours my display locks up, proceeds to go black, then “recovers” with rainbow distortion. I am often able to restart the system with alt + ctrl + F1 then ctrl+alt+del. But these seem to be becoming more frequent over time and I am still very new to linux. So assistance with figuring out what I need to do would be appreciated. Read some posts that suggested some kernels (5.4 particularly) might be better then others have have attempted to use AMD pro drivers and mesa-git drivers with no change.
System:
Kernel: 5.4.143-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
parameters: BOOT_IMAGE=/boot/vmlinuz-5.4-x86_64
root=UUID=c01cddb7-2b57-4aee-93a6-5a69361f34f7 rw udev.log_priority=3
amdgpu.gpu_recovery=1
Desktop: Xfce 4.16.0 tk: Gtk 3.24.29 info: xfce4-panel wm: xfwm 4.16.1 vt: 7
dm: LightDM 1.30.0 Distro: Manjaro Linux base: Arch Linux
Machine:
Type: Desktop System: ASUS product: N/A v: N/A serial: <filter>
Mobo: ASUSTeK model: ROG STRIX Z390-F GAMING v: Rev 1.xx serial: <filter>
UEFI: American Megatrends v: 1802 date: 12/01/2020
Battery:
Message: No system battery data found. Is one present?
Memory:
RAM: total: 31.28 GiB used: 2.59 GiB (8.3%)
RAM Report: permissions: Unable to run dmidecode. Root privileges required.
CPU:
Info: 6-Core model: Intel Core i5-9600K bits: 64 type: MCP arch: Kaby Lake
note: check family: 6 model-id: 9E (158) stepping: C (12) microcode: EA
cache: L2: 9 MiB bogomips: 44412
Speed: 800 MHz min/max: 800/4600 MHz Core speeds (MHz): 1: 800 2: 800 3: 801
4: 800 5: 800 6: 800
Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_capabilities
arch_perfmon art avx avx2 bmi1 bmi2 bts clflush clflushopt cmov constant_tsc
cpuid cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm dts epb ept ept_ad erms
est f16c flexpriority flush_l1d fma fpu fsgsbase fxsr hle ht hwp
hwp_act_window hwp_epp hwp_notify ibpb ibrs ida intel_pt invpcid
invpcid_single lahf_lm lm mca mce md_clear mmx monitor movbe mpx msr mtrr
nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni
popcnt pse pse36 pts rdrand rdseed rdtscp rep_good rtm sdbg sep smap smep
smx ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc
tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave xsavec
xsaveopt xsaves xtopology xtpr
Vulnerabilities: Type: itlb_multihit status: KVM: Vulnerable
Type: l1tf status: Not affected
Type: mds mitigation: Clear CPU buffers; SMT disabled
Type: meltdown status: Not affected
Type: spec_store_bypass
mitigation: Speculative Store Bypass disabled via prctl and seccomp
Type: spectre_v1
mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional,
IBRS_FW, STIBP: disabled, RSB filling
Type: srbds mitigation: Microcode
Type: tsx_async_abort mitigation: Clear CPU buffers; SMT disabled
Graphics:
Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
vendor: Gigabyte driver: amdgpu v: kernel bus-ID: 03:00.0 chip-ID: 1002:731f
class-ID: 0300
Display: x11 server: X.Org 1.20.13 compositor: xfwm4 v: 4.16.1 driver:
loaded: amdgpu,ati unloaded: modesetting,radeon alternate: fbdev,vesa
display-ID: :0.0 screens: 1
Screen-1: 0 s-res: 3440x1440 s-dpi: 96 s-size: 910x381mm (35.8x15.0")
s-diag: 987mm (38.8")
Monitor-1: DisplayPort-1 res: 3440x1440 dpi: 110
size: 797x334mm (31.4x13.1") diag: 864mm (34")
OpenGL: renderer: AMD Radeon RX 5700 XT (NAVI10 DRM 3.35.0 5.4.143-1-MANJARO
LLVM 12.0.1)
v: 4.6 Mesa 21.2.1 direct render: Yes
Audio:
Device-1: Intel Cannon Lake PCH cAVS vendor: ASUSTeK driver: snd_hda_intel
v: kernel alternate: snd_soc_skl,snd_sof_pci bus-ID: 00:1f.3
chip-ID: 8086:a348 class-ID: 0403
Device-2: AMD Navi 10 HDMI Audio driver: snd_hda_intel v: kernel
bus-ID: 03:00.1 chip-ID: 1002:ab38 class-ID: 0403
Device-3: C-Media ATGM1-USB type: USB
driver: hid-generic,snd-usb-audio,usbhid bus-ID: 1-6.1:4 chip-ID: 0d8c:0089
class-ID: 0300 serial: <filter>
Sound Server-1: ALSA v: k5.4.143-1-MANJARO running: yes
Sound Server-2: JACK v: 1.9.19 running: no
Sound Server-3: PulseAudio v: 15.0 running: yes
Sound Server-4: PipeWire v: 0.3.34 running: no
Network:
Device-1: Intel Ethernet I219-V vendor: ASUSTeK driver: e1000e v: 3.2.6-k
port: efa0 bus-ID: 00:1f.6 chip-ID: 8086:15bc class-ID: 0200
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
IP v4: <filter> type: dynamic noprefixroute scope: global
broadcast: <filter>
IF-ID-1: tun0 state: unknown speed: 10 Mbps duplex: full mac: N/A
IP v4: <filter> scope: global
WAN IP: <filter>
Bluetooth:
Device-1: Broadcom BCM20702A0 Bluetooth 4.0 type: USB driver: btusb v: 0.8
bus-ID: 1-6.2:5 chip-ID: 0a5c:21e8 class-ID: fe01 serial: <filter>
Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
rfk-block: hardware: no software: yes address: see --recommends
Logical:
Message: No logical block device data found.
RAID:
Message: No RAID data found.
Drives:
Local Storage: total: 1.82 TiB used: 584.62 GiB (31.4%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 860 EVO 1TB
size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
type: SSD serial: <filter> rev: 3B6Q scheme: GPT
ID-2: /dev/sdb maj-min: 8:16 type: USB vendor: Seagate model: Expansion
size: 931.51 GiB block-size: physical: 4096 B logical: 512 B type: N/A
serial: <filter> rev: 9300 scheme: MBR
Optical-1: /dev/sr0 vendor: PIONEER model: DVR-213NP rev: 1.00
dev-links: cdrom
Features: speed: 40 multisession: yes audio: yes dvd: yes
rw: cd-r,cd-rw,dvd-r,dvd-ram state: running
Partition:
ID-1: / raw-size: 931.22 GiB size: 915.53 GiB (98.32%)
used: 584.62 GiB (63.9%) fs: ext4 dev: /dev/sda2 maj-min: 8:2 label: N/A
uuid: c01cddb7-2b57-4aee-93a6-5a69361f34f7
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 296 KiB (0.1%) fs: vfat dev: /dev/sda1 maj-min: 8:1 label: NO_LABEL
uuid: 2CF3-1B16
ID-3: /home/<filter>/pCloudDrive raw-size: N/A size: 2 TiB
used: 6.87 GiB (0.3%) fs: fuse source: ERR-102
Swap:
Alert: No swap data was found.
Unmounted:
ID-1: /dev/sdb1 maj-min: 8:17 size: 931.51 GiB fs: exfat label: Toolbox
uuid: 2CC0-D83E
USB:
Hub-1: 1-0:1 info: Full speed (or root) Hub ports: 16 rev: 2.0
speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
Hub-2: 1-6:2 info: Genesys Logic Hub ports: 4 rev: 2.0 speed: 480 Mb/s
power: 100mA chip-ID: 05e3:0610 class-ID: 0900
Device-1: 1-6.1:4 info: C-Media ATGM1-USB type: Audio,HID
driver: hid-generic,snd-usb-audio,usbhid interfaces: 3 rev: 1.1
speed: 12 Mb/s power: 100mA chip-ID: 0d8c:0089 class-ID: 0300
serial: <filter>
Device-2: 1-6.2:5 info: Broadcom BCM20702A0 Bluetooth 4.0 type: Bluetooth
driver: btusb interfaces: 4 rev: 2.0 speed: 12 Mb/s chip-ID: 0a5c:21e8
class-ID: fe01 serial: <filter>
Device-3: 1-6.4:7 info: ASUSTek AURA MOTHERBOARD type: HID
driver: hid-generic,usbhid interfaces: 1 rev: 2.0 speed: 12 Mb/s
power: 100mA chip-ID: 0b05:18a3 class-ID: 0300 serial: <filter>
Hub-3: 1-9:3 info: Realtek RTS5411 Hub ports: 4 rev: 2.0 speed: 480 Mb/s
chip-ID: 0bda:5411 class-ID: 0900
Device-1: 1-9.2:6 info: Cooler Master CM110 Gaming Mouse type: Mouse,HID
driver: hid-generic,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s
power: 100mA chip-ID: 2516:0119 class-ID: 0300
Device-2: 1-9.3:8 info: Razer USA BlackWidow (2019) type: Keyboard,Mouse
driver: hid-generic,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s
power: 500mA chip-ID: 1532:0241 class-ID: 0300
Hub-4: 1-9.4:9 info: VIA Labs VL813 Hub ports: 4 rev: 2.1 speed: 480 Mb/s
chip-ID: 2109:2813 class-ID: 0900
Hub-5: 1-9.4.1:10 info: VIA Labs VL813 Hub ports: 4 rev: 2.1 speed: 480 Mb/s
chip-ID: 2109:2813 class-ID: 0900
Hub-6: 2-0:1 info: Full speed (or root) Hub ports: 10 rev: 3.1
speed: 10 Gb/s chip-ID: 1d6b:0003 class-ID: 0900
Hub-7: 2-9:2 info: Realtek Hub ports: 4 rev: 3.0 speed: 5 Gb/s
chip-ID: 0bda:0411 class-ID: 0900
Hub-8: 2-9.4:3 info: VIA Labs VL813 Hub ports: 4 rev: 3.0 speed: 5 Gb/s
chip-ID: 2109:0813 class-ID: 0900
Hub-9: 2-9.4.1:4 info: VIA Labs VL813 Hub ports: 4 rev: 3.0 speed: 5 Gb/s
chip-ID: 2109:0813 class-ID: 0900
Device-1: 2-9.4.4:5 info: Seagate RSS LLC SRD0NF1 Expansion Portable (STEA)
type: Mass Storage driver: uas interfaces: 1 rev: 3.0 speed: 5 Gb/s
power: 144mA chip-ID: 0bc2:2322 class-ID: 0806 serial: <filter>
Sensors:
System Temperatures: cpu: 27.8 C mobo: N/A gpu: amdgpu temp: 51.0 C
mem: 62.0 C
Fan Speeds (RPM): N/A gpu: amdgpu fan: 1474
Info:
Processes: 240 Uptime: 26m wakeups: 0 Init: systemd v: 248 tool: systemctl
Compilers: gcc: 11.1.0 clang: 12.0.1 Packages: pacman: 1250 lib: 437
Shell: Bash v: 5.1.8 running-in: xfce4-terminal inxi: 3.3.06
Below is a journalctl log of the crash.
Journal begins at Sun 2021-04-25 11:21:37 AEST, ends at Mon 2021-09-13 19:42:17 AEST. --
Sep 13 19:40:29 desktop-mdesk kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Sep 13 19:40:29 desktop-mdesk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=3299287, emitted seq=3299289
Sep 13 19:40:29 desktop-mdesk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process TESV.exe pid 7570 thread TESV.exe:cs0 pid 7592
Sep 13 19:40:29 desktop-mdesk kernel: amdgpu 0000:03:00.0: GPU reset begin!
Sep 13 19:40:33 desktop-mdesk kernel: kfd2kgd: cp queue preemption time out.
Sep 13 19:40:33 desktop-mdesk kernel: [drm:gfx_v10_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 10 test failed (scratch(0xC040)=0xCAFEDEAD)
Sep 13 19:40:33 desktop-mdesk kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Sep 13 19:40:33 desktop-mdesk kernel: [drm:gfx_v10_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 10 test failed (scratch(0xC040)=0xCAFEDEAD)
Sep 13 19:40:33 desktop-mdesk kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Sep 13 19:40:33 desktop-mdesk kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Sep 13 19:40:35 desktop-mdesk kernel: amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
Sep 13 19:40:35 desktop-mdesk kernel: [drm] PCIE GART of 512M enabled (table at 0x00000080012FC000).
Sep 13 19:40:35 desktop-mdesk kernel: [drm] PSP is resuming...
Sep 13 19:40:36 desktop-mdesk kernel: [drm] reserve 0x900000 from 0x81fe400000 for PSP TMR
Sep 13 19:40:36 desktop-mdesk kernel: amdgpu: [powerplay] SMU is resuming...
Sep 13 19:40:36 desktop-mdesk kernel: amdgpu: [powerplay] SMU is resumed successfully!
Sep 13 19:40:36 desktop-mdesk kernel: [drm] kiq ring mec 2 pipe 1 q 0
Sep 13 19:40:36 desktop-mdesk kernel: [drm] ring test on 10 succeeded in 55 usecs
Sep 13 19:40:36 desktop-mdesk kernel: [drm] ring test on 10 succeeded in 9 usecs
Sep 13 19:40:36 desktop-mdesk kernel: [drm] gfx 0 ring me 0 pipe 0 q 0
Sep 13 19:40:36 desktop-mdesk kernel: [drm:gfx_v10_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
Sep 13 19:40:36 desktop-mdesk kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v10_0> failed -22
Sep 13 19:40:36 desktop-mdesk kernel: amdgpu 0000:03:00.0: GPU reset(1) failed
Sep 13 19:40:36 desktop-mdesk kernel: amdgpu 0000:03:00.0: GPU reset end with ret = -22
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:37 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: spurious response 0x0:0x0, last cmd=0x770100
Sep 13 19:40:39 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00670d81
Sep 13 19:40:40 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: No response from codec, disabling MSI: last cmd=0x00670d81
Sep 13 19:40:40 desktop-mdesk fancontrol[5784]: /usr/sbin/fancontrol: line 639: echo: write error: Invalid argument
Sep 13 19:40:40 desktop-mdesk fancontrol[5784]: Error writing PWM value to /sys/class/hwmon/hwmon3/pwm1
Sep 13 19:40:40 desktop-mdesk fancontrol[5784]: Aborting, restoring fans...
Sep 13 19:40:40 desktop-mdesk fancontrol[5784]: Verify fans have returned to full speed
Sep 13 19:40:40 desktop-mdesk systemd[1]: fancontrol.service: Main process exited, code=exited, status=1/FAILURE
Sep 13 19:40:40 desktop-mdesk systemd[1]: fancontrol.service: Failed with result 'exit-code'.
Sep 13 19:40:40 desktop-mdesk audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=fancontrol comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Sep 13 19:40:40 desktop-mdesk systemd[1]: fancontrol.service: Consumed 2.703s CPU time.
Sep 13 19:40:40 desktop-mdesk kernel: manual fan speed control should be enabled first
Sep 13 19:40:40 desktop-mdesk kernel: manual fan speed control should be enabled first
Sep 13 19:40:40 desktop-mdesk kernel: audit: type=1131 audit(1631526040.934:159): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=fancontrol comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Sep 13 19:40:41 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: No response from codec, resetting bus: last cmd=0x00670d81
Sep 13 19:40:42 desktop-mdesk rtkit-daemon[1380]: Supervising 6 threads of 3 processes of 1 users.
Sep 13 19:40:42 desktop-mdesk rtkit-daemon[1380]: Successfully made thread 10500 of process 1377 owned by '1000' RT at priority 5.
Sep 13 19:40:42 desktop-mdesk kernel: snd_hda_intel 0000:03:00.1: azx_get_response timeout, switching to single_cmd mode: last cmd=0x00672400
Sep 13 19:40:42 desktop-mdesk rtkit-daemon[1380]: Supervising 7 threads of 3 processes of 1 users.
Sep 13 19:40:42 desktop-mdesk kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!