It’s a desktop and the RX 570 was made by MSI. In general it crashes playing Starcraft 2 while in the lobby, if I remember well, it never crashed while “playing”, I mean, controlling units. It’s always in the lobby. But it crashes using Vivaldi browser too, just less often.
Another hope is that AMD is hiring more Engineers to work with GPU driver, so lets see if they can fix the issue.
It’s interesting that you use a desktop; I was expecting your RX 570 to be a laptop GPU instead. I don’t play any games especially those that are GPU intensive or use Chromium-based browsers (even though I use some Electron-based apps like VSCode and Discord every day), or I might be just plain lucky with my specific setup.
For me the problem also still persists with a MSI RX 480. My and my colleagues mobile VEGAs in our notebooks for work are running fine so far.
@Zesko I’ve looked into your posted issue as my freezes only occur when I play fullscreen videos in Firefox. (Don’t know about games as I don’t have time anymore to play games.) But the error-log looks quiet different than mine. I think it’s the plasmashell that’s causing it.
It’s really frustrating that the problem persists so long.
Is your udev rule actually sticking? I created a help topic:
amdgpu-udev-rule-ignored/89725
Either of these commands will identify if the rule applied successfully:
udevadm info --attribute-walk /sys/class/drm/card0 | grep -Pi 'power_dpm'
udevadm info -a -n '/dev/dri/card0' | grep -Pi 'power_dpm'
Curiously, my problems began in May. I have an r9 380x which suffers from a voltage draw issue. If I don’t force the performance level to high I get graphical corruption, crackling audio, and an eventual green screen of death (the system appears to be working, audio is playing, but no input is recognized).
The actual failure relates to a sudden surge of power when the fans go directly from low to high speed without transitioning through a middle range (usually during gaming).
I went so far as to build a custom bios with dpm state 0 set to 1000mv and flashed it to the card. As many of these post point out, a firmware blob is applied during startup which invalidates this. As such, in my case, every iteration of linux-firmware causes me issues.
The same issue affects Windows 10 and I have to use a custom .xml and load it as a performance profile in the Radeon Adrenalin drivers.
I always assumed my card was an anomaly related to power supply and motherboard. There was no reason to change the hardware once I’d figured out the solution.
I just can’t get a udev rule to apply automatically.
There are many different error-logs for the same issue, they contain plasmashell too, I do not think the plasmashell is fault, but browser with AMD driver is.
See my first error log today:
Nov 15 08:16:52 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=6534, emitted seq=6536
Nov 15 08:16:52 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vivaldi-bin pid 2019 thread vivaldi-bi:cs0 pid 2069
Nov 15 08:17:00 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:17:00 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:17:00 zesko kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:221 vmid:6 pasid:32771, for process plasmashell pid 1228 thread plasmashel:cs0 pid 1293)
Nov 15 08:17:00 zesko kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000800000000000 from client 0x1b (UTCL2)
my second error-log today:
Nov 15 08:27:50 zesko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 15 08:27:55 zesko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 15 08:27:55 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=235399, emitted seq=235401
Nov 15 08:27:55 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vivaldi-bin pid 1862 thread vivaldi-bi:cs0 pid 1907
Nov 15 08:27:59 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 15 08:27:59 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Nov 15 08:27:59 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 15 08:27:59 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Nov 15 08:28:00 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Nov 15 08:28:03 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:28:03 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:28:03 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:28:03 zesko kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:221 vmid:5 pasid:32771, for process plasmashell pid 1236 thread plasmashel:cs0 pid 1301)
Nov 15 08:28:03 zesko kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000800000030000 from client 0x1b (UTCL2)
You can see that is why both logs are different.
I am waiting for Linux Kernel 5.16 for this bug fix.
I lost my patient with the issue because update after update it was not fixed, so, as suggested from other members, I decided to downgrade the linux-firmware to 20210818.c46b8c3-1 version and so far it is the must stable solution for this issue. I will not say that it’s error proofing version because I had just one incident and it happen day after the downgrade, now I’m 30 days without incident. Peoople like me that may are using older GPU hardware (i’m using RX 570) probably using older linux firmware will not make any difference.
I can’t confirm, but I have the impression that this issue might be related to overclock and data corruption inside GPU, I see very different behavior for GPU cooler comparing the downgrade solution vs newest ones, the noise / speeds are very different. Do you know that feeling when you are pushing overclocking and your system works but unstable? it looks the same for me. For any reason they might be pushing too much for performance with the linux-firmware.
I’d like to mention that I have two other random issue, one is the same as Linux was pointing in the Linux challenging, for any reason when my system has the screen locked it doesn’t accept my password anymore so I need to push reset bottom to fix the issue. The second one is random screen freeze, the screen just freezes at any screen and no log are registered.
System slowdown and freezing returned again for me as well.
backing again to the 20210818.c46b8c3-1 version
I just downgraded to 20210818.c46b8c3-1. System is responsive again. Will have to see if freezing occurs. But the system seems to be running better again after downgrading.
You lucky I guess, I usually have a freeze per day more or less Random freezes, how to troubleshoot
most of the time I don’t know what it is, cause there are no logs, but when there are logs, it is the DRM thing.
With and without the downgrade. With an RX580 and now a 6800XT.
I haven’t noticed the freeze in a while, but I haven’t upgraded Mesa from 21.1.4-1, so everytime I update I always get the “error” warning: mesa: ignoring package upgrade (21.1.4-1 => 21.2.5-1). But I don’t get freeze either
Sadly only partly. My Raven Ridge APU at work didn’t freeze since the August update. But my Polaris GPU (RX480) still freezes when watching videos in Firefox.