Thank you! I hope I don’t have to use it in the future, but this template seems great; if I start experiencing this kind of issues again, I’ll definitely post my issue on their site.
Hi, everyone c: It’s second day when I don’t have any crash by amdgpu. What have I done: add amdgpu.noretry, move to 5.12 and disable iommu in bios. (or maybe last update fix it?)
Hope it is a fix for you! In my case, when I set the amdgpu.noretry=0 kernel parameter I couldn’t even boot into Manjaro, just getting a black screen after picking the option in GRUB.
An update on my end: Until yesterday, I had spent slightly more than a week with no GPU-related freezing or crashing. I left my computer on for a while and, when I went back to use it, a lot of page faults had happened and my system was completely frozen (I’m remarking the fact that this time I wasn’t even using it, it was just running some processes on the foreground and had a couple of applications open). All of this happened while using the 5.12.0-1 kernel and the mesa updates that came with the April 28th system upgrade.
So I guess I can call it a freeze-less personal record since I first experienced this issues around a month ago. I’m glad my performance is improving, but yet I can’t state my problem is solved. I’ve seen that a system update came out on May the 6th, including some mesa updates, but haven’t dared to try it . I will do so in the next days, probably.
I’ve been having the freezings less frequently, but still randomly after installing the mesa drivers. So I just updated to the latest packages, and updated to the 5.10 kernel. We’ll see how long it takes with this combo to go belly-up.
I got a workaround for my lenovo e585 with ryzen 2500u:
Install Downgrade yay -S downgrade
Show available mesa versions: sudo downgrade --ala-only mesa
Select newest version mesa 21.1.0, which is actually an upgrade.
Don’t put mesa on ignore list
Reboot
No crashing anymore (since yesterday). Amdgpu still has significant stack traces and errors. But it seems that the newer mesa version handles amdgpu crashes with only slight lagging and without killing the whole system.
Kernel: 5.12
Desktop: XFCE with Xorg
Screen: 4k 30Hz
Another workaround: Boot windows from second HDD, use e.g. VirtualBox with raw-disk-pass-through, boot Linux within VM and direct HDD access. Windows GPU driver will not crash.
Yeah, I unsuccessfully tried that mesa downgrade at first. I used to think this problems were due to a GPU driver update, but reverting it and going back to a previous packages state did not fix my problems at all. The culprit of this is getting farther and farther from us each day.
My suggestion is to keep on updating as the Stable branch does, and use the 5.12 kernel at its latest version (regarding that possible fix that was mentioned earlier at this post). I still get freezes and crashes, but hopefully with a decreasing frequency.
The same problem appeared for me after a recent upgrade on Arch Linux (after a while not upgrading). No problems on the same machine in the previous 2 years.
OS: Arch Linux x86_64
Host: Lenovo E595
Kernel: 5.12.3-arch1-1
DE: i3/regolith
CPU: AMD® Ryzen 7 3700u with radeon vega mobile gfx × 8
GPU: AMD® Radeon™ vega 10 graphics
Running with iommu=soft amd_iommu=pt ivrs_ioapic[32]=00:14.0 intel_iommu=igfx_off , using mesa-git, xf86-video-amdgpu-git
Very sporadically, the screen freezes (audio still seems to work), and the DE restarts after ~30 seconds.
mai 12 21:32:15 e595 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
mai 12 21:32:15 e595 kernel: psmouse serio1: TouchPad at isa0060/serio1/input0 lost synchronization, throwing 5 bytes>
mai 12 21:32:15 e595 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32>
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800105600000 from client 27
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
(several repetitions)
mai 12 21:32:15 e595 regolith.desktop[5429]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Relevant upgrade that might have introduced the issue: