Green checkerboard system crash (AMDGPU)

“Extreme” is not a default profile, use default one if you want to compare your result with mine.

1 Like

You need to donwgrade complete mesa, just type in all the packets after pacman -U
See also:
https://wiki.archlinux.org/index.php/Downgrading_packages

1 Like

Which one of these did you use?
foo

I would suggest you try to go to another direction than downgrading. Try upgrading and replace mesa with mesa-git, which would give you mesa 21.1.x

As stated earlier, already tried to update but still crashed for multiple times. Collaterally, the formal bug report page does have two participants who had interim success after downgrading. Thus, it would be reasonable and probative to try downgrading first.

Furthermore it feels like that this problem is also relevant quantitatively to the frame and detail intensity of the game. The other user in the bug report page stated that he only had freezes when entering a specific dungeon, but I was having these crashes everywhere, but less likely in Oribos because it is a much smaller map with simpler shapes and details. None of the crashes happened in pre-Shadowlands maps, such as Bolarus; it looks like Blizzard has used some new object rendering technique specifically in Shadowlands that has overloaded the card via the driver’s negligence. however, such a new technique is not limited to a single game but many games published recently, such as Cyberpunk. Food for thought, what could it be? If we pinpoint that, we can temporarily have it disabled as a mitigating measure.

Just turn it on and press start, don’t change anything. That’s default. I would have to check it tomorrow if you want to be certain though. :slight_smile:

EDIT: 1080p Medium is a default profile.

1 Like

If you don’t have messages about VMEM they you have different behaviour than my system. I also have same card as you. Very stable in gaming.
Can you show sudo dmesg | grep amdgpu

it’s RTX / Ray Tracing stuff. DXR. Both WoW and Cyberpunk implement it to some degree. WoW uses DXR 1.1 shadows and maybe even something else – and they specifically mentioned, they started doing it with shadowlands. Cyberpunk uses even more heavy ray tracing.

1 Like

Can you please confirm that Stellaris also used it in its recent versions? Because it crashed there too, but I haven’t tried it for many times under different kernels. Thank you. Hitman never crashed at all.

Don’t know about Stellaris, but I doubt it uses any. So much about my idea… oh well. Maybe someone else comes up with better theory :slight_smile:

Then again… how the ray tracing should even happen on AMD at this point? Does it even work or it goes to some workaround or substitute path and does that? And when it’s something else, then older (not ray traced at all) games could use the same code that the substitution uses?

1 Like

Yes I was asking the same question in the bug report page as well. There must be an alternative solution for AMD card that uses some API or bottom level interface which doesn’t mix well with Mesa at intensity of drawing stuff. In other words, there is no exception catching to bypass and skip the frames that have caused the problem, but relied on GPU’s reset itself, which of course never works well. My two cents. And plus since I am using a 5K2K monitor, I have to limit the frame rate to an extreme degree to avoid the overloading frames. Ain’t working well either.

1 Like

On “1080p Medium”, I get:

16296
FPS Min: 83.55, Avg: 121.89, Max: 157.88
GPU °C Min: 44.0, Max: 68.0

That’s a lot better! :yum: So I guess my “4G Decode” is already off (although I’ve not yet checked my UEFI settings).

My sudo dmesg | grep amdgpu is a little messy:

[848023.507291] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.507384]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.507467]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.507856] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x7f/0x1f0 [amdgpu]
[848023.507950]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.508030]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.508119]  amdgpu_dm_atomic_commit_tail+0xc85/0x1f50 [amdgpu]
[848023.508245] WARNING: CPU: 20 PID: 2879736 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hwseq.c:1866 dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.508364] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.508453]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.508929] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x7f/0x1f0 [amdgpu]
[848023.509033]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.509119]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.509213]  amdgpu_dm_atomic_commit_tail+0xc85/0x1f50 [amdgpu]
[848023.509352] WARNING: CPU: 21 PID: 2895559 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hwseq.c:1866 dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.509489] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.509586]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.509669]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.509759]  amdgpu_dm_atomic_commit_tail+0xc85/0x1f50 [amdgpu]
[848023.545457] WARNING: CPU: 21 PID: 2895559 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hwseq.c:1848 dcn20_setup_gsl_group_as_lock+0x7f/0x1f0 [amdgpu]
[848023.545595] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x7f/0x1f0 [amdgpu]
[848023.545695]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.545779]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.545871]  amdgpu_dm_atomic_commit_tail+0xc85/0x1f50 [amdgpu]
[848023.546013] WARNING: CPU: 21 PID: 2895559 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hwseq.c:1866 dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.546014] Modules linked in: cfg80211 snd_seq_dummy snd_seq serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common btrfs xor raid6_pq ufs hfsplus hfs minix ntfs msdos jfs xfs tun udp_diag tcp_diag inet_diag nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_ct nft_chain_nat squashfs snd_usb_audio nf_nat nf_conntrack snd_usbmidi_lib snd_rawmidi nf_defrag_ipv6 snd_seq_device nf_defrag_ipv4 wacom mousedev joydev input_leds mc libcrc32c nf_tables nfnetlink loop nct6775 hwmon_vid eeepc_wmi asus_wmi battery sparse_keymap rfkill wmi_bmof mxm_wmi nls_iso8859_1 nls_cp437 vfat amdgpu fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core gpu_sched snd_hwdep ttm snd_pcm kvm igb snd_timer drm_kms_helper snd irqbypass r8169 sp5100_tco syscopyarea sysfillrect realtek sysimgblt pcspkr k10temp
[848023.546146] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x81/0x1f0 [amdgpu]
[848023.546242]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.546325]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.546418]  amdgpu_dm_atomic_commit_tail+0xc85/0x1f50 [amdgpu]
[848023.546600] WARNING: CPU: 20 PID: 2879736 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hwseq.c:1848 dcn20_setup_gsl_group_as_lock+0x7f/0x1f0 [amdgpu]
[848023.546601] Modules linked in: cfg80211 snd_seq_dummy snd_seq serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common btrfs xor raid6_pq ufs hfsplus hfs minix ntfs msdos jfs xfs tun udp_diag tcp_diag inet_diag nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_ct nft_chain_nat squashfs snd_usb_audio nf_nat nf_conntrack snd_usbmidi_lib snd_rawmidi nf_defrag_ipv6 snd_seq_device nf_defrag_ipv4 wacom mousedev joydev input_leds mc libcrc32c nf_tables nfnetlink loop nct6775 hwmon_vid eeepc_wmi asus_wmi battery sparse_keymap rfkill wmi_bmof mxm_wmi nls_iso8859_1 nls_cp437 vfat amdgpu fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core gpu_sched snd_hwdep ttm snd_pcm kvm igb snd_timer drm_kms_helper snd irqbypass r8169 sp5100_tco syscopyarea sysfillrect realtek sysimgblt pcspkr k10temp
[848023.546741] RIP: 0010:dcn20_setup_gsl_group_as_lock+0x7f/0x1f0 [amdgpu]
[848023.546838]  dcn20_pipe_control_lock.part.0+0xff/0x1d0 [amdgpu]
[848023.546919]  dc_commit_updates_for_stream+0xec1/0x1580 [amdgpu]
[848023.547007]  amdgpu_dm_atomic_commit_tail+0xc85/0x1f50 [amdgpu]

It continues like that for a while. Ever since I rolled back to kernel 5.4, my journalctl's filled with several of those messages per second (the systemd-journal process is always at the top of the CPU activity list, haha).

Yes, this must be the case :slight_smile: I have result in 15.4k due to Ryzen 5 1600X CPU, you have Ryzen 9 3900X and result 16.3k. Good to know that more performance can be squeezed from this card if I upgrade the CPU :slight_smile:
So, performance wise everything is perfect in your machine, all your settings are correct.

1 Like

Just out of curiosity I tested this on my computer also. I don’t have 6000 series and not even 5000 series, but a measly old Vega64 (not because I don’t want to buy 6000, but they don’t exist). But on the other hand I have X570 mobo + 5950X.

kernel: 5.10.9-xanmod1-MANJARO
mesa: 21.1.0_devel.133588.766538f83cb-1

BAR ON and medium 1080p I got:
11978; 70.07, 89.59, 115.67
12137; 74.50, 90.78, 117.05
12113; 74.50, 90.60, 116.78
1080p extreme:
3984; 23.53, 29.80, 36.81

BAR OFF and medium 1080p I got:
11794; 70.88, 88.22, 116.37
12077; 74.72, 90.34, 116.53
12098; 74.61, 90.49, 116.90
1080p extreme:
3947; 23.37, 29.52, 36.37

all in all, BAR ON is better for me… it seems. Or at worst there is no difference.
To be fair, this test doesn’t really stress the GPU memory either… @ medium it uses only like 1.4G out of 8G in Vega… so the bigger addressable memory doesn’t even matter much. At extreme it also uses only 3.3G VRAM… real test would with something above 6-8G VRAM usage.

1 Like

So… meh, I bought a 3070 FE and fixed a bunch of hiccups, now everything is at peace. Money pwnz.

2 Likes