lol,i will buy a laptop with intel xe gpu
So, it has crashed three times today…
Perhaps you could skip all that financial trouble by switching a distro?
I have used pop_os for a short time, everything was fine except for bluetooth. So finally I switched back.
Pop_os is a good distribution worth trying.
I got a workaround for my lenovo e585 with ryzen 2500u:
- Install Downgrade
yay -S downgrade
- Show available mesa versions:
sudo downgrade --ala-only mesa
- Select newest version mesa 21.1.0, which is actually an upgrade.
- Don’t put mesa on ignore list
- Reboot
- No crashing anymore (since yesterday). Amdgpu still has significant stack traces and errors. But it seems that the newer mesa version handles amdgpu crashes with only slight lagging and without killing the whole system.
Kernel: 5.12
Desktop: XFCE with Xorg
Screen: 4k 30Hz
Another workaround: Boot windows from second HDD, use e.g. VirtualBox with raw-disk-pass-through, boot Linux within VM and direct HDD access. Windows GPU driver will not crash.
False positive: I could not fix the system crashes by downgrading mesa and/or amdgpu. I just had another crash.
Just for the record: this is no workaround - this is using a whole other operating system to avoid using amdgpu
driver.
Yeah, I unsuccessfully tried that mesa
downgrade at first. I used to think this problems were due to a GPU driver update, but reverting it and going back to a previous packages state did not fix my problems at all. The culprit of this is getting farther and farther from us each day.
My suggestion is to keep on updating as the Stable branch does, and use the 5.12 kernel at its latest version (regarding that possible fix that was mentioned earlier at this post). I still get freezes and crashes, but hopefully with a decreasing frequency.
The same problem appeared for me after a recent upgrade on Arch Linux (after a while not upgrading). No problems on the same machine in the previous 2 years.
- OS: Arch Linux x86_64
- Host: Lenovo E595
- Kernel: 5.12.3-arch1-1
- DE: i3/regolith
- CPU: AMD® Ryzen 7 3700u with radeon vega mobile gfx × 8
- GPU: AMD® Radeon™ vega 10 graphics
Running with iommu=soft amd_iommu=pt ivrs_ioapic[32]=00:14.0 intel_iommu=igfx_off
, using mesa-git, xf86-video-amdgpu-git
Very sporadically, the screen freezes (audio still seems to work), and the DE restarts after ~30 seconds.
mai 12 21:32:15 e595 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
mai 12 21:32:15 e595 kernel: psmouse serio1: TouchPad at isa0060/serio1/input0 lost synchronization, throwing 5 bytes>
mai 12 21:32:15 e595 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32>
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800105600000 from client 27
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
mai 12 21:32:15 e595 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
(several repetitions)
mai 12 21:32:15 e595 regolith.desktop[5429]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Relevant upgrade that might have introduced the issue:
- mesa-git (1:21.1.0_devel.137471.fbebe365476-1 → 1:21.2.0_devel.139210.922f71b819b-1)
- vulkan-icd-loader (1.2.172-1 → 1.2.176-1)
- vulkan-headers (1:1.2.173-1 → 1:1.2.177-1)
- xf86-video-amdgpu-git (538.6ed4863-1 → 539.aedbf47-1)
- amdvlk (2021.Q1.6-1 → 2021.Q2.2-1)
Cross-posting to gitlab freedesktop org drm amd issues 934 (sorry, can’t post links yet).
Downgrade of linux-firmware
seems to have solved the GPU freezes for me. The computer runs for at least almost two days without any problems.
CPU: Quad Core AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics (-MT MCP-) speed/min/max: 1399/1400/3700 MHz
Kernel: 5.10.34-1-MANJARO x86_64 Up: 1d 18h 09m Mem: 5348.1/30077.2 MiB (17.8%) Storage: 1.59 TiB (38.2% used) Procs: 380
Shell: fish inxi: 3.3.04
gnome-shell 1:3.38.4+13+gcf9d73ed5-1
lib32-mesa 21.0.3-3
lib32-mesa-vdpau 21.0.3-3
libva-mesa-driver 21.0.3-3
linux-firmware 20201124.r1786.b362fd4-1
mesa 21.0.3-3
mesa-demos 8.4.0-4
mesa-vdpau 21.0.3-3
Thanks for the hint with linux-firmware
!
I used downgrade
to upgrade linux-firmware
to 20210511.7685cf4
and did not face any crashes since yesterday. Journalctl logs look fine. I will keep you updated in case of any new crashes.
Even after re-enabling XFCE composite effects - no crashes for many hours.
Update: linux-firmware 20210511.7685cf4
solved the amggpu-related crashes for me.
However, I am facing problems with my lenovo usb-c dockingstation. It crashes/ hangs from time to time. But this is another issue and could be related to other system components.
I’m glad the upgrade solved it both for you and @AkhIL!! I’ll try performing it soon. Still, what evidence are you basing on to state that it has solved the freezing problem? I don’t mean to sound pessimistic, but I’ve already experience week-long spans without crashing, hoping that it meant a final solution for the problem, just to have another random crash again
I’m just asking this because you may have noticed some microcode/code change on such update that could be handling the (still unknown) culprit of this issues. If you’re stating it from the growing crash-less spans, then all my hopes are with you to have found a definitive solution
worked for me too, on other os though.
It was hard crashing every ~3-6 hours, backed off the firmware version to something from november, now sitting on a day uptime with fairly heavy load.
should probably file a bug with the firmware upstream.
I got freeze in less then ten minutes with linux-firmware 20210512.r1926.55d9649-1. I had multiple days uptime with 20201124.r1786.b362fd4-1. Trying 20210211.r1830.f7915a0-1 right now.
linux-firmware-20210315.r1846.3568f96-1 should work. I have this version in fully functional snapshot.
@poynting_factor your guess seems correct. I agree with you and @AkhIL: The update just reduced the frequency of crashes for me. Only downgrading fixed the problem for me. I am currently using : 20201218.646f159
.
I will start testing linux-firmware-20210315.r1846.3568f96-1
as suggested by @AkhIL.
Slightly offtopic: I upgraded to kernel 5.13 rc1
and this is the first time I am actually getting 4k 60 Hz
(instead of only 30 Hz
) with my Lenovo E585
. So far, this seems stable with 20210315.3568f96
.
well I think the problem is somewhere buried deep inside the linux-firmware.
I do use now 20201218.646f159
for about 4 days and round about 30h of uptime. NOT ONCE did my system freeze…
Quite considering the upgrade to 5.13, did you allow the linux-firmware / amd-ucode to update with the kernel or are you still on above mentioned version?
Update on my side: I’ve been running smoothly since at least a week, no crashes or unexpected freezes. Just ran the last stable upgrade (from May 19th) a couple of days ago, and as of now nothing seems to have broken. Here are my current package versions:
-
linux-firmware
: 20210511.r1922.7685cf4-1. - Kernel: 5.12.2-1.
-
mesa
and its dependencies:21.0.3-3
.
Not using any special kernel parameters, either. Honestly, I don’t know what may have fixed it; I’m betting that a combination of the firmware and kernel updates solved it (or at least reduced the crashes frequency to the point I can’t get too frustrated on it ); my guess is still about that line casting a uint to a bigger architecture when performing pagination at AMD devices, the commit was linked earlier here: System frequently crashing after GPU drivers update - #30 by fkfd (I’m not marking it as a solution since I can’t think of anyway to prove it was what really fixed the issue, but I hope I can do so in the future).