Having an issue with Proton games locking up the whole system almost instantly in an old computer (Xeon X3440, 12gb DDR3, HD 7870 2gb, 240gb SSD) with Manjaro KDE. It only happens in GPU-intensive games in seconds or minutes after launch. I monitored the hardware with Mangohud, but it doesn’t show anything interesting at the time of crashing. I also enabled proton logs and checked journalctl, but seems that it doesn’t record anything relevant due to shutting down with the power button (as system gets completely unresponsive, SysRq or TTY aren’t working).
What I tried so far with no success:
Enabling/disabling swap file and OOM killer
Switching to X11 and Xfce
Downgrading the kernel (6.6 LTS, 5.15 LTS)
Disabling mangohud
Gamemode
Setting CPU governor to performance (CoreCtrl)
Allowing GPU to govern itself (CoreCtrl)
Disabling C-states, virtualization and other unrelated options in BIOS
Trying a different Proton version
It’s worth mentioning that this system behaves fine in Windows 10 and passes any CPU/memory stress test I throw at it. No OC is applied. I initially blamed the experimental amdgpu mode for GCN1, however I tried the HD 7870 in a second Linux PC (AM4-based) and it ran the same games completely fine. Games that are not as intensive also run fine in Linux on the older one. So, where should I look?
Are your keyboard LED’s blinking while your system is frozen?
When you confirm the temps from your GPU/CPU are fine and since your GPU works flawless on the AM4 system.
For me it looks Reducing Mhz (or rise Vcore) on your Xenon CPU could solve this problem. Btw. its better to play with Mhz before you adjust Vcore, specially adjusting Vcore can damage your hardware… so its important to know what you doing.
No, keyboard stays on as usual (options like caps lock are unresponsive though).
GPU/CPU temps are fine and below 70C, trivial ones like VRM and chipset included. It had a stable (or so I thought) overclock to 4ghz before, so when I got the first crashes in Linux it was the first thing that I reverted to factory (reset the CMOS as well to be sure). Booted to a Windows livecd and ran OCCT for an hour as well, couldn’t get it to crash. The only remotely relevant thing I’ve seen in journalctl in all of my attempts was:
amdgpu: Disabling VM faults because of PRT request!
I’ve also attempted decreasing GTT from 6GB to 3GB, no success again. Same with Vcore, I even ran it at 1.4V (which is fine for these 45nm chips).
Its also maybe worth checking your GPU Hotspot temp… i had a problem in the past where high demanding games was crashing, while my GPU Cores had okayish temps around 71-74°, but the GPU Hotspot had 104-110°.
I also could play less demanding games stable (but my GPU vents going crazy from time to time, like a emergency state). Anyways, long story short, i fixed it with replacing Heat paste on my GPU Core.
After i was done my GPU Core shows identical Temps ~73°, but my Hotspot went down from ~108° to ~84° and my 2080Ti was no longer crashing.
CPU was found to be the culprit, although it’s a very unusual case because pretty much the only scenario where it consistently causes issues is Linux gaming with Wine and DXVK. Replacing it with a different one solved the problem entirely.
I attempted that as well, even went as far as running it at 2.1ghz with 1.35 vcore or using a single rank RAM kit to reduce the stress on IMC. No luck. Whatever problem it is, it seems unusually hard to trace. Debugging with DXVK may shine some light on it but I never really got anything meaningful out of the logs.