Glad to hear it fixed your case! I have just broken a ~50-days-without-freezing streak lol, but my case has improved a lot too.
I just have a doubt here. So your fix is to stick to the 21.1.4 version of mesa, despite of any further versions there are? I thought that was the latest version, but I might be wrong. I’m confused because of the downgrade you made.
I am just using the downgrade command to actually upgrade to a version, which is not yet available in the standard manjaro repos. Just check your current version of mesa. This will be an upgrade. Alternatively: Wait some more days and get the new mesa via pacman -Syu.
So for me: My notebook crashed 4-5 times a day before. Now this is fixed with mesa-21.1.4.
Just registred to say thank you!
For me it also seems to have fixed a memory leak, everytime after I woke up the PC from standy, about 1GB or an half (not sure, I’m using standby A LOT) more ram than before was used.
edit: As recommed in the arch forums I now downgraded the kernel to 5.10 LTS (instead off 5.11 as in the arch forums) and the linux firmware to 20210315.3568f96-2. Will edit this post if I see the crash again.
Also enabled ssh in case I can’t even reach the tty anymore
I can confirm what Cencil reported. The issue is also not solved on Polaris GPUs. I fully updated my system yesterday and this evening after a couple of youtube videos it crashed again.
Kernel 5.12
linux-firmware 20210629
AMD RX 480
-- Journal begins at Thu 2021-01-21 08:14:09 CET, ends at Wed 2021-07-14 22:12:10 CEST. --
Jul 14 20:57:07 ManjaroGamingPC kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jul 14 20:57:07 ManjaroGamingPC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU fault detected: 146 0x0048080c for process plasmashell pid 1790 thread plasmashel:cs0 pid 1878
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000009
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0400800C
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32772) at page 9, read from 'TC0' (0x54433000) (8)
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU fault detected: 146 0x0068040c for process plasmashell pid 1790 thread plasmashel:cs0 pid 1878
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010084D
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04008008
Jul 14 20:57:07 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM fault (0x08, vmid 2, pasid 32772) at page 1050701, read from 'TC0' (0x54433000) (8)
Jul 14 20:57:17 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU fault detected: 147 0x0aa02008 for process plasmashell pid 1790 thread plasmashel:cs0 pid 1878
Jul 14 20:57:17 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00108554
Jul 14 20:57:17 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04020008
Jul 14 20:57:17 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: VM fault (0x08, vmid 2, pasid 32772) at page 1082708, read from 'CB2' (0x43423200) (32)
Jul 14 20:57:17 ManjaroGamingPC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1258798, emitted seq=1258801
Jul 14 20:57:17 ManjaroGamingPC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1790 thread plasmashel:cs0 pid 1878
Jul 14 20:57:17 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset begin!
Jul 14 20:57:21 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: failed to suspend display audio
Jul 14 20:57:21 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jul 14 20:57:21 ManjaroGamingPC kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jul 14 20:57:22 ManjaroGamingPC kernel: amdgpu: cp is busy, skip halt cp
Jul 14 20:57:22 ManjaroGamingPC kernel: amdgpu: rlc is busy, skip halt rlc
Jul 14 20:57:22 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: BACO reset
Jul 14 20:57:22 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset succeeded, trying to resume
Jul 14 20:57:22 ManjaroGamingPC kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
Jul 14 20:57:22 ManjaroGamingPC kernel: [drm] VRAM is lost due to GPU reset!
Jul 14 20:57:24 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:25 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:26 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:27 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:28 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:29 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:30 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:31 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:32 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:32 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:57:32 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, giving up!!!
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <uvd_v6_0> failed -1
Jul 14 20:57:33 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd test failed (-110)
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110
Jul 14 20:57:33 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset(3) failed
Jul 14 20:57:33 ManjaroGamingPC kernel: amdgpu 0000:26:00.0: amdgpu: GPU reset end with ret = -110
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:57:33 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:57:43 ManjaroGamingPC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 14 20:57:53 ManjaroGamingPC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 14 20:58:02 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jul 14 20:58:02 ManjaroGamingPC kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Yes. There seems to be another bug. I celebrated too early. I got another crash these days with latest linux-firmware. However, as described by @Cencil downgrading linux-firmware seems to solve it for me.
I’ve been using for two weeks and the improvement is clear, but yes, I can also confirm it’s not fully fixed. I got just 2 issues and they were different, from at least one issue a day, 1 issue a weak is a nice improvement.
The 1st one is the same, the famous:
20/07/2021 18:41 kernel [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out!
While the second one was:
22/07/2021 08:13 kernel [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=74945, emitted seq=74947
My PC is running since 13 days without a reboot and massive use with the setup I have posted before. However, a final fix would be awesome, so I can update the firmware and kernel someday…
More I use Linux more I discover new stuffs, as expected, and I’m seeing a lot of redundancies in Linux that makes new user life very hard. Looks like the issue we are facing is related to MESA drive and following https://www.phoronix.com/ website I discovered the AMDVLK and after some research looks like it does the same function as the MESA RADV, so there is redundant content here that user can choose, and in this sense, did someone here tried to use ADMVLK replacing the MESA RADV to see if it can fix the issue?
or the issue is not related to this part of the MESA?
The truth is, from all the various topics I’ve read, noone has a workaround. People try things and get disappointed because the issue manifests a week later. Essentially we cannot reliably reproduce this bug which is always fun
I second this, having recently purchased a refurbished Lenovo Thinkpad E595 with a Radeon RX Vega 10 (Picasso architecture) integrated GPU and experienced those random KWin reset moments in that same PC. Since I’ve downgraded linux-firmware on the laptop last week, I haven’t ran into the graphics reset bug so far.
Meanwhile, my desktop with a Radeon RX 570 (Polaris) has never had that same graphics freeze and reset issue, so I have no incentive to lock linux-firmware on that system.
I downgraded that package to 20210511.7685cf4 according to this reply. So far I haven’t had the freeze-and-reset behavior come back since installing that version.
Sorry to make more questions, but if someone don’t mind to help me for better understand here. Thanks in advanced.
In general when I ready some reference to firmware I think about and kind of embedded software loaded in a specific hardware, like the firmware for the GPU, firmware for the printer, firmware for the router, etc.
Considering that Linux has the kernel and it has built-inside the AMDGPU driver witch shares some driver functionality with MESA, what is this refered linux-firmware? what it do? How it related to the kernel, GPU driver and the MESA?