From my last post from now, my system was also updated as indicated below, and so far I got just one issue, when I let the game opened and the PC away for long time, it caused a different issue log as you can see below, but the effect was the same. I’m very happy that it has being fixed or it very close to it.
Yesterday I just updated my kernel to 5.15.rc2 and the linux-firmware is the same 20210818.c46b8c3-1 I got one crash and the issue still persists.
The current Mesa was update to MESA 21.2.2 from the last Manjaro Stable update, but no effect in this issue.
Also before updating the kernel I was noticing that the crashes log are varying more. As you can see below 4 different issue log that results in the same crash.
12/08/2021 19:28 kernel [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=3947807, emitted seq=3947809
12/08/2021 19:28 kernel [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1169 thread plasmashel:cs0 pid 1270|
You could attach the info about your system and logs from one of the crashes (preferably the “flip_done” one) to the issue I created, maybe it will help better understand this particular problem.
The crash-and-reset episodes have finally made a comeback on my laptop after applying today’s testing branch update—with a vengeance. I was able to reproduce this about 75% of the time when I plug in my laptop, and sometimes the system just goes blank and reboots on its own! I even tried downgrading the kernels (5.10 and 5.14), linux-firmware from 20210919 to 20210818, and mesa from 21.2.3 to 21.2.2, and it still freaks out almost every time I plug in the charger! I am also getting similar logs in the system journal logs as @lordsansuiposted here.
I have never gotten anything like this for a little over a month between when I installed linux-firmware 20210818 and today, and I’m really furious that this has struck at me back a lot harder to the point where Manjaro is unusable on my laptop while it’s plugged in.
Edit: I’ve narrowed down the cause to TLP, which was recently upgraded to version 1.4.0 with this new testing update. Downgrading that package to 1.3.1 stopped those crash-and-reset episodes from happening for now.
Edit 2: This turned out to be a false alarm. I upgraded TLP back to 1.4.0, and after copying /etc/tlp.conf.pacnew to /etc/tlp.conf and reloading tlp.service, the crashes have stopped when I plugged in my laptop. Upon further investigation, the line RADEON_DPM_PERF_LEVEL_ON_AC="high" from my old tlp.conf was the culprit.
I’m not planning to back to windows and my apologies for who might be offended, but Linux is a bit far from windows stability, this forever crashes is very annoying, and for who doesn’t have the same issue they have others. I just hope the Steam Deck and valve investments help to increase Linux market share so we can get more quality of life. Feel free to think different.
09/10/2021 11:08 kernel [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
09/10/2021 11:08 kernel [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=509775, emitted seq=509777
09/10/2021 11:08 kernel [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process SC2_x64.exe pid 8847 thread SC2_x64.exe pid 8927
I don’t understand why you’re still getting those crashes for so long despite you have a RX 570 according to your profile popout. I’m running a Gigabyte RX 570 on my desktop and I never encountered that bug on that PC like I did with my laptop. Is yours a desktop or laptop GPU, and if it’s a desktop GPU, what brand?
It’s a desktop and the RX 570 was made by MSI. In general it crashes playing Starcraft 2 while in the lobby, if I remember well, it never crashed while “playing”, I mean, controlling units. It’s always in the lobby. But it crashes using Vivaldi browser too, just less often.
Another hope is that AMD is hiring more Engineers to work with GPU driver, so lets see if they can fix the issue.
It’s interesting that you use a desktop; I was expecting your RX 570 to be a laptop GPU instead. I don’t play any games especially those that are GPU intensive or use Chromium-based browsers (even though I use some Electron-based apps like VSCode and Discord every day), or I might be just plain lucky with my specific setup.
For me the problem also still persists with a MSI RX 480. My and my colleagues mobile VEGAs in our notebooks for work are running fine so far.
@Zesko I’ve looked into your posted issue as my freezes only occur when I play fullscreen videos in Firefox. (Don’t know about games as I don’t have time anymore to play games.) But the error-log looks quiet different than mine. I think it’s the plasmashell that’s causing it.
It’s really frustrating that the problem persists so long.
Is your udev rule actually sticking? I created a help topic:
amdgpu-udev-rule-ignored/89725
Either of these commands will identify if the rule applied successfully:
udevadm info --attribute-walk /sys/class/drm/card0 | grep -Pi 'power_dpm'
udevadm info -a -n '/dev/dri/card0' | grep -Pi 'power_dpm'
Curiously, my problems began in May. I have an r9 380x which suffers from a voltage draw issue. If I don’t force the performance level to high I get graphical corruption, crackling audio, and an eventual green screen of death (the system appears to be working, audio is playing, but no input is recognized).
The actual failure relates to a sudden surge of power when the fans go directly from low to high speed without transitioning through a middle range (usually during gaming).
I went so far as to build a custom bios with dpm state 0 set to 1000mv and flashed it to the card. As many of these post point out, a firmware blob is applied during startup which invalidates this. As such, in my case, every iteration of linux-firmware causes me issues.
The same issue affects Windows 10 and I have to use a custom .xml and load it as a performance profile in the Radeon Adrenalin drivers.
I always assumed my card was an anomaly related to power supply and motherboard. There was no reason to change the hardware once I’d figured out the solution.
I just can’t get a udev rule to apply automatically.