Green checkerboard system crash (AMDGPU)

I was playing a Proton game doing serious work, when all video (including windows in the background on other monitors) and my mouse froze, though audio was still playing. Then, a staticky, green checkerboard pattern appeared on all my monitors. Then my system rebooted. Thanks to Conky in the background, I saw my GPU temp was around 60 °C, so no overheating. CPU temp was also normal.

Can any wise people give a rough diagnosis based on this journalctl log?

Jan 10 18:41:18 HOST kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 10 18:41:18 HOST kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 10 18:41:23 HOST kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 10 18:41:23 HOST kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=27728808, emitted seq=27728810
Jan 10 18:41:23 HOST kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GTA5.exe pid 350528 thread GTA5.exe:cs0 pid 350561
Jan 10 18:41:23 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 10 18:41:27 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: failed to suspend display audio
Jan 10 18:41:27 HOST kernel: ------------[ cut here ]------------
Jan 10 18:41:27 HOST kernel: WARNING: CPU: 7 PID: 349566 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:3240 dcn20_validate_bandwidth_fp+0x8d/0xd0 [amdgpu]
Jan 10 18:41:27 HOST kernel: Modules linked in: tun snd_seq_dummy snd_seq udp_diag tcp_diag inet_diag wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha libblake2s_generic nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 snd_usb_audio nf_tables snd_usbmidi_lib snd_rawmidi snd_seq_device libcrc32c mousedev mc joydev wacom apple_mfi_fastcharge nfnetlink squashfs loop nct6775 hwmon_vid snd_hda_codec_realtek eeepc_wmi asus_wmi sparse_keymap rfkill video wmi_bmof mxm_wmi snd_hda_codec_generic amdgpu ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence edac_mce_amd snd_hda_codec snd_hda_core snd_hwdep soundwire_bus gpu_sched ttm snd_soc_core drm_kms_helper vfat snd_compress fat kvm ac97_bus snd_pcm_dmaengine snd_pcm irqbypass cec snd_timer rapl
Jan 10 18:41:27 HOST kernel:  igb syscopyarea r8169 snd sysfillrect sysimgblt sp5100_tco fb_sys_fops pcspkr i2c_piix4 realtek soundcore k10temp mdio_devres i2c_algo_bit libphy dca wmi pinctrl_amd mac_hid acpi_cpufreq drm fuse uinput crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys dm_mod trusted tpm usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ccp rng_core sr_mod cdrom xhci_pci
Jan 10 18:41:27 HOST kernel: CPU: 7 PID: 349566 Comm: kworker/7:1 Tainted: G        W         5.10.2-2-MANJARO #1
Jan 10 18:41:27 HOST kernel: Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VIII HERO (WI-FI), BIOS 1001 09/09/2019
Jan 10 18:41:27 HOST kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Jan 10 18:41:27 HOST kernel: RIP: 0010:dcn20_validate_bandwidth_fp+0x8d/0xd0 [amdgpu]
Jan 10 18:41:27 HOST kernel: Code: 00 7b 35 22 85 14 1f 00 00 75 2f 31 d2 f2 0f 11 85 58 26 00 00 48 89 ee 4c 89 e7 e8 3d f6 ff ff 89 c2 22 95 14 1f 00 00 75 30 <0f> 0b 48 89 9d 58 26 00 00 5b 5d 41 5c c3 75 c9 48 89 9d 58 26 00
Jan 10 18:41:27 HOST kernel: RSP: 0018:ffffaf9e206cfbf8 EFLAGS: 00010246
Jan 10 18:41:27 HOST kernel: RAX: 0000000000000001 RBX: 4079400000000000 RCX: 00000003167a1007
Jan 10 18:41:27 HOST kernel: RDX: 0000000000000000 RSI: e92450526cd4af29 RDI: 00000000000311a0
Jan 10 18:41:27 HOST kernel: RBP: ffff92f31c740000 R08: ffff92efaf0fa000 R09: ffff92ee2fb90000
Jan 10 18:41:27 HOST kernel: R10: ffff92efaf0fa000 R11: 0000000100000001 R12: ffff92ee2fb90000
Jan 10 18:41:27 HOST kernel: R13: ffff92ee3e372000 R14: ffff92ee0c4c5800 R15: ffff92f31c740000
Jan 10 18:41:27 HOST kernel: FS:  0000000000000000(0000) GS:ffff92fcdebc0000(0000) knlGS:0000000000000000
Jan 10 18:41:27 HOST kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 18:41:27 HOST kernel: CR2: 000013a6082fa000 CR3: 000000054aaa2000 CR4: 0000000000350ee0
Jan 10 18:41:27 HOST kernel: Call Trace:
Jan 10 18:41:27 HOST kernel:  dcn20_validate_bandwidth+0x24/0x40 [amdgpu]
Jan 10 18:41:27 HOST kernel:  dc_validate_global_state+0x2f2/0x390 [amdgpu]
Jan 10 18:41:27 HOST kernel:  ? dc_rem_all_planes_for_stream+0xcb/0x110 [amdgpu]
Jan 10 18:41:27 HOST kernel:  dm_suspend+0x18b/0x1c0 [amdgpu]
Jan 10 18:41:27 HOST kernel:  amdgpu_device_ip_suspend_phase1+0x73/0xd0 [amdgpu]
Jan 10 18:41:27 HOST kernel:  ? amdgpu_fence_process+0x4d/0x130 [amdgpu]
Jan 10 18:41:27 HOST kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Jan 10 18:41:27 HOST kernel:  amdgpu_device_pre_asic_reset+0x185/0x19c [amdgpu]
Jan 10 18:41:27 HOST kernel:  amdgpu_device_gpu_recover.cold+0x5cf/0x95d [amdgpu]
Jan 10 18:41:27 HOST kernel:  amdgpu_job_timedout+0x121/0x140 [amdgpu]
Jan 10 18:41:27 HOST kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Jan 10 18:41:27 HOST kernel:  process_one_work+0x1d6/0x3a0
Jan 10 18:41:27 HOST kernel:  worker_thread+0x4d/0x3d0
Jan 10 18:41:27 HOST kernel:  ? rescuer_thread+0x410/0x410
Jan 10 18:41:27 HOST kernel:  kthread+0x133/0x150
Jan 10 18:41:27 HOST kernel:  ? __kthread_bind_mask+0x60/0x60
Jan 10 18:41:27 HOST kernel:  ret_from_fork+0x22/0x30
Jan 10 18:41:27 HOST kernel: ---[ end trace 143fc880115578a7 ]---
Jan 10 18:41:28 HOST kernel: amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 10 18:41:28 HOST kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Jan 10 18:41:28 HOST kernel: amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 10 18:41:28 HOST kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jan 10 18:41:28 HOST kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Jan 10 18:41:29 HOST kernel: [drm] free PSP TMR buffer
Jan 10 18:41:29 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: BACO reset
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 10 18:41:32 HOST kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
Jan 10 18:41:32 HOST kernel: [drm] VRAM is lost due to GPU reset!
Jan 10 18:41:32 HOST kernel: [drm] PSP is resuming...
Jan 10 18:41:32 HOST kernel: [drm] reserve 0x900000 from 0x81fe400000 for PSP TMR
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resuming...
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a3d00 (42.61.0)
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resumed successfully!
Jan 10 18:41:32 HOST kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jan 10 18:41:32 HOST kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Jan 10 18:41:32 HOST kernel: [drm] JPEG decode initialized successfully.
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow start
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow done
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: [drm] Skip scheduling IBs!
Jan 10 18:41:32 HOST kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) succeeded!
Jan 10 18:41:32 HOST kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
Jan 10 18:41:42 HOST xembedsniproxy[1889]: Container window visible, stack below
-- Boot f86eadc0107b46b88e93903c0d208ef5 --

Is it just a driver thing, or my is my GPU faulty (:sob:)?

inxi -Fazy:

System:
  Kernel: 5.10.2-2-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.10-x86_64 
  root=UUID=67c27dd3-024f-45e4-a206-1e93f079a256 rw quiet 
  cryptdevice=UUID=746b6bd1-af47-497c-b84a-be0ba820069b:luks-746b6bd1-af47-497c-b84a-be0ba820069b 
  root=/dev/mapper/luks-746b6bd1-af47-497c-b84a-be0ba820069b 
  resume=/dev/mapper/luks-746b6bd1-af47-497c-b84a-be0ba820069b apparmor=1 
  security=apparmor udev.log_priority=3 audit=0 
  Desktop: KDE Plasma 5.20.4 tk: Qt 5.15.2 wm: kwin_x11 dm: SDDM 
  Distro: Manjaro Linux 
Machine:
  Type: Desktop Mobo: ASUSTeK model: ROG CROSSHAIR VIII HERO (WI-FI) 
  v: Rev X.0x serial: <filter> UEFI: American Megatrends v: 1001 
  date: 09/09/2019 
CPU:
  Info: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen 2 
  family: 17 (23) model-id: 71 (113) stepping: N/A microcode: 8701013 
  L2 cache: 6 MiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm 
  bogomips: 182131 
  Speed: 3591 MHz min/max: 2200/3800 MHz boost: enabled Core speeds (MHz): 
  1: 3591 2: 2050 3: 2191 4: 2192 5: 3596 6: 2052 7: 2047 8: 2795 9: 1863 
  10: 1863 11: 2199 12: 2196 13: 2051 14: 2196 15: 2196 16: 2190 17: 2195 
  18: 2793 19: 1861 20: 1863 21: 2203 22: 2195 23: 2196 24: 2196 
  Vulnerabilities: Type: itlb_multihit status: Not affected 
  Type: l1tf status: Not affected 
  Type: mds status: Not affected 
  Type: meltdown status: Not affected 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP: 
  always-on, RSB filling 
  Type: srbds status: Not affected 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] 
  vendor: XFX Pine driver: amdgpu v: kernel bus ID: 0c:00.0 chip ID: 1002:731f 
  Display: x11 server: X.Org 1.20.10 compositor: kwin_x11 driver: amdgpu,ati 
  unloaded: modesetting,radeon alternate: fbdev,vesa display ID: :0 screens: 1 
  Screen-1: 0 s-res: 6400x2440 s-dpi: 96 s-size: 1693x645mm (66.7x25.4") 
  s-diag: 1812mm (71.3") 
  Monitor-1: DisplayPort-0 res: 1920x1080 hz: 60 dpi: 82 
  size: 598x336mm (23.5x13.2") diag: 686mm (27") 
  Monitor-2: DisplayPort-1 res: 2560x1440 dpi: 109 
  size: 598x336mm (23.5x13.2") diag: 686mm (27") 
  Monitor-3: HDMI-A-0 res: 1920x1080 hz: 60 dpi: 96 
  size: 509x286mm (20.0x11.3") diag: 584mm (23") 
  OpenGL: renderer: AMD Radeon RX 5700 XT (NAVI10 DRM 3.40.0 5.10.2-2-MANJARO 
  LLVM 11.0.0) 
  v: 4.6 Mesa 20.3.1 direct render: Yes 
Audio:
  Device-1: AMD Navi 10 HDMI Audio driver: snd_hda_intel v: kernel 
  bus ID: 0c:00.1 chip ID: 1002:ab38 
  Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK 
  driver: snd_hda_intel v: kernel bus ID: 0e:00.4 chip ID: 1022:1487 
  Device-3: Logitech G933 Wireless Headset Dongle type: USB 
  driver: hid-generic,snd-usb-audio,usbhid bus ID: 5-2:2 chip ID: 046d:0a5b 
  Sound Server: ALSA v: k5.10.2-2-MANJARO
Swap:
  Kernel: swappiness: 60 (default) cache pressure: 100 (default) 
  ID-1: swap-1 type: partition size: 8.8 GiB used: 0 KiB (0.0%) priority: -2 
  dev: /dev/dm-1 maj-min: 254:1 
  mapped: luks-2ce750cb-7d13-4bf1-86ee-baaf8c9103fe 
Sensors:
  System Temperatures: cpu: 40.0 C mobo: N/A gpu: amdgpu temp: 45.0 C 
  mem: 46.0 C 
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 0 
Info:
  Processes: 465 Uptime: 13m wakeups: 0 Memory: 62.79 GiB 
  used: 4.72 GiB (7.5%) Init: systemd v: 247 Compilers: gcc: 10.2.0 
  clang: 11.0.0 Packages: 1784 pacman: 1780 lib: 483 rpm: 0 flatpak: 4 
  Shell: Bash v: 5.1.0 running in: server inxi: 3.2.01 

Your GPU stopped responding. Normally that would mean mandatory hard reset of your PC, but today’s drivers can handle it to some degree and continue to work in some cases.
GPU can stop responding due to variety of cases. To make sure your hardware is stable, test it with stress tests, benchmarks, to make sure it can withstand constant load for hours. Then you can check on different OS/mesa version/kernle version to see if problem persists. Also check if it happens with native GPU apps not Proton based.
If hardware is ok and you exhausted all options then you could forward this bug report to maintainers - not sure if Manjaro forum is right place, could be Arch upstream forum or kernel dev list, someone here will correct me on this.

1 Like

Thanks!

I ran some Unigine benchmarks (Heaven, Valley, and Superposition), and my FPSes are in the 60s on max settings (other people report similar performance). The only posted Linux benchmarks for the same GPU over at Phoronix have it doing around 130 (and, when running Heaven via the Phoronix Test Suite, I get around 100).

Could you recommend some GPU stress testers?

In non-Proton games, there are no problems.

This happened on kernel 5.10, on which I’m having other severe issues; maybe it’s related. :thinking: (That said, I’ve been having Proton issues on 5.9 too, with either no Proton games launching until I reboot or wipe and regenerate Steam’s directory or running at a choppy ~10 FPS after a few days of running fine. That actually motivated me to upgrade to 5.10, where I haven’t had that yet – only the green checkerboard. :yum:)

I am reporting the same symptom with exactly same logs shown. The problem started from Kubuntu about 3-4 days ago with Mesa 20.3.3 update and kernel 5.8, subversion 36. After seeing the problem I tried to switch to latest mesa-git which is 21.1 and reverted the kernel to subversion 33, had no problem for two days, then it came back. Switched to Manjaro with all the updates, the problem went away for one more day, then came back after an attempt of undervolting using PowerUPP which erroneously refused to show any data but fixed by a restart. I thought that it was caused by the software conflict (but I used to do it before without problem) so I have made a fresh installation of Manjaro without any software installed but Lutris and WoW. The problem persists and I am having random crashes. I have had no choice but to switch back to Windows, which does not have any problem running any game, thus I still believe that it is due to software, not hardware.

Am planning to switch to an Nvidia card but still being held back because of the cost. It’s not quite a problem for me to stay with Windows albeit with annoyance due to perfectionism. I searched about this type of errors and found that it is a “textbook” amdgpu crash, and I am out of solution other than prying the case open for reseating or adding another power cable, etc. which is a huge pain for NCASE M1.

Well, I just want to play some games.

Does anyone have better experience when reverting back to mesa version before 20.3.3? Worth to have a try.

I’m on mesa 20.3.3-1, and, after going back to kernel 5.4.89-1 (the previous LTS as 5.9 was near end of life :skull:), I haven’t had this or the other serious issues I was having so far.

I suspect the new kernels just have issues.

OK I am typing using iPad right at the dead screen of WoW which ran for only one minute before freezing, using the 5.4 LTS kernel. I can see that this kernel indeed has some remedial measures because there is no static-scrambled checkers with green or pink colors, just several attempts of GPU resets and freeze. However, that failed to solve my problem. Thus, my problem is more leaning to a hardware fault because of the test by a fresh system; but why doesn’t Windows have the error and makes everything work?

Sigh. Really thinking about getting an Nvidia.

BTW, have you checked reverting the mesa driver back to before 20.3.3 yet? It involves a complicated downgrade of packages such as vdpau and Vulkan (which I think to be related to this freeze), that is why I didn’t do it. If you have any capability to do it, please let me know. Thank you.

1 Like

using the 5.4 LTS kernel

Is this the Manjaro 5.4 or an Ubuntu one? I’m not sure, but I think there’s differences between the two (or significant differences in the stuff around the kernels across distros) that affect things.

have you checked reverting the mesa driver back to before 20.3.3 yet? It involves a complicated downgrade of packages such as vdpau and Vulkan (which I think to be related to this freeze), that is why I didn’t do it. If you have any capability to do it, please let me know. Thank you.

That sounds too scary for me. I’m not confident enough to try stuff like that. : p

On a side note, when I tried to launch WoW through Battle.net Application for the first time after crashing in Linux, Battle.net fails to launch it, like there is no reaction done other than a bunch of internal job by pressing the “Play” button which thereafter turns gray, then turns blue (enabled) again. No WoW program ever starts. But, if I shut down Battle.net and relaunch it, then press the “Play” button, the game starts normally. It seems like there are some parameters that has changed in the background between the two launches of Battle.net App that is related to the GPU via its Windows driver; of course Windows is not going to tell me what has happened, so I can only guess. If there is any ROM in the GPU chip that stores some information such as voltage, frequency and rendering methods etc., it is definitely bendable by drivers (Windows or Linux) which are sometimes falsely interpreted by some versions (like 20.3.3 plus relevant kernels) and consequently cause the series of error.

This, is the problem of AMD, by not precisely categorizing the errors from the integration of hardware and software engineers due to the open-source nature. The closed sourced Windows driver seems to have many more tricks in the pocket for maintaining the stability. I understand that the calibration lacks the motivation because of the apparent non-profitability, but here is what we have for users, and the open source drivers are simply in the way of us homies having fun. And since we are only the 1% of total user share in the market, AMD doesn’t care. Humanity owns, and we need to take a balance between what works and what bloats.

Oh yes, and this Windows “internal calibration” happened for twice, so it’s proven to be reproducible.

It’s Manjaro 5.4. Ubuntu uses 5.8. :stuck_out_tongue:

1 Like

I think the proprietary AMD driver on Linux is roughly equivalent to their driver on Windows. : o If so, an experiment could be to try using it on Linux and see how that goes. (The inverse of the experience with Nvidia, where you always install the proprietary driver cuz the open source one is a non-starter.)

https://wiki.archlinux.org/index.php/AMDGPU#AMDGPU_PRO

It’s kind of sketchy though. If you get artifacts and crashes on Manjaro’s 5.4 kernel with Mesa 20.3.3-1, I’d be pretty lost. I didn’t experience any of this before kernel 5.10, so, to me, the etiology is purely new kernel bugs. xD I would assume you indeed might have to roll back Mesa and some fancy stuff since the new Mesa stuff would expect to run against the newer kernels, so you’d have issues even on 5.4? But I’m not… :dizzy_face:

An easier way to confirm that might be to try the latest Ubuntu LTS, 20.04, which uses the 5.4 kernel and presumably only has repo packages (like Mesa et al) meant to run against 5.4. (Ubuntu 20.10 uses kernel 5.8.)

By the way, I’m not so sure Nvidia is a safe refuge from rolling release problems.

If the problem here is that AMD drivers are buggy on newer kernels, it sucks, but Nvidia drivers sometimes don’t work at all because their development is out-of-band with kernel development (since their proprietary driver isn’t in the kernel, like AMD’s open source driver is).

My move would more be along the lines of sticking with Ubuntu LTS than switching GPU vendors. :yum: (That said, Nvidia does have a nice settings panel and G-SYNC actually works on Linux, unlike AMD’s adaptive sync.) But, also, I’m one of those people ideologically opposed to Nvidia. :triumph: If I cared about things running more than deriving psychological sustenance out of entertaining abstract personal ideals, I’d be on Windows!

From other discussions, Nvidia has not had a single problem of driver fault. The problem this time seems to be irrelevant to kernels, as I have tried all the kernels and they all crashed; since the first impression of crash happened right after the update of 20.3.3 driver on Kubuntu, it is more likely a driver fault. Thus, the most direct solution is to not use this driver; furthermore, there is also a possibility that there could be a hardware fault (although unlikely because Windows works), switching to the N card itself seems to be the killer solution. It’s just money that is in the way. :stuck_out_tongue: Money matters. Therefore, I believe that replacing the car would solve the problem. For now, I am stuck to Windows, but my X1 Extreme is on Kubuntu, which… can cough be a test bench of WoW. I believe that there is no crash there.

Update: This is the formal bug feedback page: 5700XT crashes when playing games See if you can find yourself back here. I have already written my issue.

1 Like

@fsck_and_pray make sure you disable Above 4B Decode in UEFI. This can cause instability. And definitely cause drop in performance, I’ve been tracking this issue since I’ve read about BAR on Phoronix. I have exactly the same GPU as you, also on Ryzen platform.
Check what is your setting via
sudo dmesg | grep BAR

:open_mouth: sudo dmesg | grep BAR returns nothing:

[▓▓▓▓▓▓▓ 22:02:53 ~]$ sudo dmesg | grep BAR
[▓▓▓▓▓▓▓ 22:02:55 ~]$ 

What is “Above 4B Decode”? And “BAR”?

I think he means “Above 4G Decode” in BIOS settings. If you have it, try to disable it and see. I don’t know what BAR means other than… place to drink and the fence in the courtroom.

sudo dmesg | grep VRAM

If your result is:
[drm] Detected VRAM RAM=8176M, BAR=8192M

It means you have it enabled. Disabled option reports as BAR=256M.

What is it:

“This is a feature that’s only compatible with the latest AMD Ryzen 5000 processors and AMD Radeon 6000 GPU products; anything earlier, such as a Radeon 5700XT for example, do not have this feature.”

It’s not supported on 5700 XT, but it will get enabled (in experimental state) if you have Above 4G Enabled in UEFI.

My results are:
Ryzen 5 1600X, Radeon 5700 XT, Mesa 20.3.3, 5.10.7-3-MANJARO x86_64

Above 4G Decode: off
Score: 15403

Above 4G Decode: on
Score: 9705

You see how big drop in performance is as this point.

What are your results in Unigine Superposition?

sudo dmesg | grep VRAM also returns nothing.

In Superposition, on “1080p Extreme”, I get:

4765
FPS Min: 28.86, Avg: 35.64, Max: 42.05
GPU °C Min: 44.0, Max: 67.0

Check the link I posted as someone just updated it. It definitely is on the driver. I didn’t have time to try it, but how easy is it to use the “downgrade” in Manjaro to downgrade the driver to a certain version?

Not very since I can’t figure it out. :yum: The only stuff I see online about downgrading drivers via mhwd is for Nvidia drivers (which have the version in the name, so it’s easier – the AMD driver is just video-linux – well, at least the meta-package that contains the driver in mhwd).

That said: I still haven’t had problems (using video-linux 2018.05.04, mhwd-amdgpu 19.1.0-1, and mesa 20.3.3-1) on Manjaro kernel 5.4.89-1.