Hey, I’ve got an issue with my GPU - Radeon RX560: running basically anything compute-heavy on that card makes my temperature skyrocket and the system shutting itself off due to overheating. On the last failure dmesg from last boot shows following events:
paź 23 09:52:57.870416 pc_name kernel: amdgpu 0000:91:00.0: amdgpu: Disabling VM faults because of PRT request!
paź 23 09:53:43.030412 pc_name kernel: amdgpu 0000:91:00.0: amdgpu: ERROR: GPU over temperature range(SW CTF) detected!
paź 23 09:53:43.030761 pc_name kernel: amdgpu 0000:91:00.0: amdgpu: ERROR: System is going to shutdown due to GPU SW CTF!
and it’s fairly consistent with what happens, when I launch for example World of Warcraft, the temperature spikes from 50 degrees Celsius to 89 in a second, and emergency shutdown goes after. I tried reapplying thermal paste and giving a thorough clean, but it didn’t seem to help all that much.
The interesting thing is, that under Windows, even more demanding games keep my GPU fairly cool at around 50-70 degrees. This leads me to believe that there might be some driver shenanigans, or perhaps some mishandling of power in the kernel.
Can someone help me debug this issue further and perhaps somehow fix it?
I really don’t want to switch back to Windows after years of harmonious coexistence with my Linux rig.
I experienced something similar to what you describe when using an RX580 (especially with games, or transcoding operations); albeit not to the point of overheating and shutting down. This sometimes occurred in both Linux and Windows, depending on the load.
There is no issue with the card; nor is there likely any issue with Manjaro or any other OS, in relation to this.
The solution is simple:
Invest in a more robust cooling system for your machine; multiple fans, if possible; or perhaps a water-cooling solution if that’s an option.
Yes, expensive, but it’s your machine; choose whatever fits your situation.
Thanks for taking a look at my problem, but I’m sadly not sure this is a thing to be solved with a cooling system - especially since there’s nothing particularly wrong with the current setup when doing much more GPU-intensive things on Windows.
I could dump more money into my PC, but I think it might be an expensive overkill in this situation.
Potentially expensive, yes. However, I can only share my experience with the same family of graphics. Upgrading the cooling system solved it for me, on every OS that experienced the same symptoms. In my case, it fell short of overheating; but not by much.
I have absolute no experience with AMD GPU’s, specially not in Linux… but im a Tech Nerd and i have great experience with Hardware.
89° sounds pretty ■■■■■■ with a weak GPU like a Radeon 560…
My few points what could happening here, where you have special settings that you don’t have in Linux? Even it is not the case, it still can help reduce the Temp issue what you experience:
1.Possible that you have a Individual Fan Curve or always max RPM’s in Windows?
2.You undervolted/underclocked your GPU in Windows?
3.You using a FPS Limiter in Windows?
4.You activated vsync together with a 60Hz Display in Windows?
Edit:
Investigate your problems and try to run the same application with the same details in the same viewpoint (with the same drawcalls/polygons) and check your RPM/FPS/Temp’s.
I don’t know better, but something comes in my mind, that your GPU in Windows has some kind of bottle neck and thats why you have heat issues in Linux.
Maybe Bill Gates put a little bug in your PC Case and when you boot in Linux, the little bug runs to your GPU and stops your airflow.
You ignoring the fact, that his Card runs flawless in Windows… still a good PC Case with a nice airflow (i can recommend Fractal Silent Cases btw) is always welcomed.
This is possible, since Windows offers some AMD Software that seems to handle optimization. It does not seem that RPMs turn to max on Windows - everything seems far more quiet under duress (Battlefield V), versus being fairly loud at around 2500RPM under just browsing web on Linux. Would you suggest some resources on setting the fan curve on Linux? It seems like a process that requires some prior knowledge.
Nope, unless the software did that for me on Windows - I don’t really think it did.
Nope.
Turned off in the AMD app.
The app I’m referring to for your reference is AMD Software Pro
If you don’t have a strange Bitcoin Trojan Virus in Linux, im pretty sure there is something wierd going on, my Nvidia GPU (2080Ti) has the same Heat in Windows Idle/Gaming as i do under Linux…
Is the RX560 driving those two screens? (the RX560 has two HDMI outputs; unless memory fails me). If so, the RX560 might be somewhat under-powered for the purpose; and this could be contributing.
It has been working fine until recently. Unfortunately I can’t really pinpoint the moment when it started going worse, especially since my first thought was either a PSU failing or the dirt buildup. I switched the PSU - same thing, I cleaned it up - ditto.