System frequently crashing after GPU drivers update

There are many different error-logs for the same issue, they contain plasmashell too, I do not think the plasmashell is fault, but browser with AMD driver is.

See my first error log today:

Nov 15 08:16:52 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=6534, emitted seq=6536
Nov 15 08:16:52 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vivaldi-bin pid 2019 thread vivaldi-bi:cs0 pid 2069
Nov 15 08:17:00 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:17:00 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:17:00 zesko kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:221 vmid:6 pasid:32771, for process plasmashell pid 1228 thread plasmashel:cs0 pid 1293)
Nov 15 08:17:00 zesko kernel: amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x0000800000000000 from client 0x1b (UTCL2)

my second error-log today:

Nov 15 08:27:50 zesko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 15 08:27:55 zesko kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov 15 08:27:55 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=235399, emitted seq=235401
Nov 15 08:27:55 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vivaldi-bin pid 1862 thread vivaldi-bi:cs0 pid 1907
Nov 15 08:27:59 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 15 08:27:59 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Nov 15 08:27:59 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 15 08:27:59 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Nov 15 08:28:00 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Nov 15 08:28:03 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:28:03 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:28:03 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Nov 15 08:28:03 zesko kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:221 vmid:5 pasid:32771, for process plasmashell pid 1236 thread plasmashel:cs0 pid 1301)
Nov 15 08:28:03 zesko kernel: amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x0000800000030000 from client 0x1b (UTCL2)

You can see that is why both logs are different.


I am waiting for Linux Kernel 5.16 for this bug fix.

Hello,

I lost my patient with the issue because update after update it was not fixed, so, as suggested from other members, I decided to downgrade the linux-firmware to 20210818.c46b8c3-1 version and so far it is the must stable solution for this issue. I will not say that it’s error proofing version because I had just one incident and it happen day after the downgrade, now I’m 30 days without incident. Peoople like me that may are using older GPU hardware (i’m using RX 570) probably using older linux firmware will not make any difference.

I can’t confirm, but I have the impression that this issue might be related to overclock and data corruption inside GPU, I see very different behavior for GPU cooler comparing the downgrade solution vs newest ones, the noise / speeds are very different. Do you know that feeling when you are pushing overclocking and your system works but unstable? it looks the same for me. For any reason they might be pushing too much for performance with the linux-firmware.

I’d like to mention that I have two other random issue, one is the same as Linux was pointing in the Linux challenging, for any reason when my system has the screen locked it doesn’t accept my password anymore so I need to push reset bottom to fix the issue. The second one is random screen freeze, the screen just freezes at any screen and no log are registered.

1 Like

Just tried the new linux-firmware drive 20211027.1d00989-1 available today in the stable branch and the issue remains.

The updated mesa was 21.2.5

backing again to the 20210818.c46b8c3-1 version

19/11/2021 16:09	kernel	[drm:gfx_v8_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
19/11/2021 16:09	kernel	[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=468293, emitted seq=468295
19/11/2021 16:09	kernel	[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process SC2_x64.exe pid 6806 thread SC2_x64.exe pid 6867
19/11/2021 16:09	kernel	amdgpu: cp is busy, skip halt cp
19/11/2021 16:09	kernel	amdgpu: rlc is busy, skip halt rlc
19/11/2021 16:09	kernel	[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
19/11/2021 16:09	kernel	[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

System slowdown and freezing returned again for me as well.

backing again to the 20210818.c46b8c3-1 version

I just downgraded to 20210818.c46b8c3-1. System is responsive again. Will have to see if freezing occurs. But the system seems to be running better again after downgrading.

i haven’t noticed the freeze in a while now. is it the same for everybody?

1 Like

You lucky I guess, I usually have a freeze per day more or less :slightly_frowning_face: Random freezes, how to troubleshoot
most of the time I don’t know what it is, cause there are no logs, but when there are logs, it is the DRM thing.
With and without the downgrade. With an RX580 and now a 6800XT.

I haven’t noticed the freeze in a while, but I haven’t upgraded Mesa from 21.1.4-1, so everytime I update I always get the “error” warning: mesa: ignoring package upgrade (21.1.4-1 => 21.2.5-1). But I don’t get freeze either

Sadly only partly. My Raven Ridge APU at work didn’t freeze since the August update. But my Polaris GPU (RX480) still freezes when watching videos in Firefox. :neutral_face: