Overheat or CPU Frequency Management Issues

I don’t think they conflict but do your own reading, i cannot guarantee for sure. I have it deinstalled now in favor of Corectl (they worked both, i only needed the governor and not the frequency scaling), now i see the tlp service. I guess if they conflict it is automatically dis/enabled on install.

I tried disabling and completely removing TLP. After that, cpupower doesn’t retain its settings after reboot, which is strange but kind of okay. Then I set a limit, but still saw impressive numbers at 4300 MHz in htop.
But, okay, thanks for your help!

Okay… well. Now you have 2 modes: automatic and full speed. You can write a simple script and monitor the temperatur and when it reaches a threshold, it will switch to full speed until it is below the threshold again. That would need little work, but better than nothing when you do intensive tasks.

Many years ago, on a Xubuntu 14.04 i used this.
It is very old and probably needs heavy adapting to work on manjaro today, but as a starting point for a project

@megavolt
@Teo

Thank you, I’ll try to write a script.
I’m curious, what could be causing this?
Maybe it’s worth opening the laptop and checking the thermal paste? I have some experience with laptop disassembly. But I’m not sure if it’s the right solution.

I can offer a quick rundown of what I’ve done with a zen3 laptop:

  • TLP ( not power-profiles-daemon )
  • zenpower ( zenpower3-dkms )
  • amd-pstate-epp ( using boot option amd_pstate=active )

Thats about it really.
I believe yours is a zen2, but I believe all of the above is still compatible.

I will also point out the pstate scenario is a bit confusing … and would take a lot of space here … but this reddit post seems roughly accurate:
https://www.reddit.com/r/linux/comments/15p4bfs/amd_pstate_and_amd_pstate_epp_scaling_driver/

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
amd-pstate-epp

I can also report in my case the epp scaling driver automatically switches between balance_performance when plugged and balance_power when unplugged.

$ cat /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance
balance_performance

# unplug #

$ cat /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power
balance_power

While I havent done exhaustive testing the above configuration seems to work for me with some fan spin up under load, otherwise quiet, and never hitting a major heat ceiling.

1 Like

Thank you for the advice, I’ll give this option a try. A couple of questions:
Did your laptop also overheat?
What does “not power-profiles-daemon” mean in this context with TLP (not power-profiles-daemon)?

No overheating that I have observed.
Though it was louder while also being less energy efficient before I started tweaking.
tlp conflicts with power-profiles-daemon so one cannot have both. power-profiles-daemon being a newer package that is supposed to integrate with desktop power management (being able to choose ‘performance’ in KDE power widget, etc).
Well … you can have both installed, but then you would need to mask power-profiles-daemon’s service.

More tlp info:
https://wiki.archlinux.org/title/TLP

Ah, I see, I’m using GNOME, and it seems there’s no power management, but just in case, I’ll check out that daemon.

@cscs

In general, yes, it seems to really help. At least my laptop shut down a bit later, which I did the following:

I switched to amd-pstate-epp,
checked that I don’t have a power management daemon.

The only thing is, I couldn’t compile zenpower, here’s the log:

(1/2) Arming ConditionNeedsUpdate...
(2/2) Install DKMS modules
==> ERROR: Missing var kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing mnt kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing root kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing home kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing lost+found kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing usr kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing opt kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing lib64 kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing proc kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing sbin kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing dev kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing rootfs-pkgs.txt kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing srv kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing etc kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing sys kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing lib kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing run kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing bin kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing tmp kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing boot kernel headers for module zenpower3/0.2.0.
==> ERROR: Missing desktopfs-pkgs.txt kernel headers for module zenpower3/0.2.0.

Here’s the tlp-stat -p:

+++ Processor
CPU model      = AMD Ryzen 7 4800HS with Radeon Graphics

/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver    = amd-pstate-epp
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor  = powersave
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors = performance powersave
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq  =   400000 [kHz]
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq  =  4300000 [kHz]
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq  =   400000 [kHz]
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq  =  4300000 [kHz]
/sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference = balance_performance [EPP]
/sys/devices/system/cpu/cpu0/cpufreq/energy_performance_available_preferences = default performance balance_performance balance_power power 

/sys/devices/system/cpu/cpu1..cpu15: omitted for clarity, use -v to show all

/sys/devices/system/cpu/amd_pstate/status              = active
/sys/devices/system/cpu/amd_pstate/cppc_dynamic_boost  = (not available)
/sys/module/workqueue/parameters/power_efficient       = Y
/proc/sys/kernel/nmi_watchdog                          = 0

+++ Platform Profile
/sys/firmware/acpi/platform_profile                    = (not available)
/sys/firmware/acpi/platform_profile_choices            = (not available)

You are missing the headers for your kernel;

sudo pacman -Syu linux65-headers

As to zenpower in general please see the project page:

(there are further steps such as unloading k10temp)

1 Like

I’ve installed Zenpower3. Now it’s recognized in the sensors, but the laptop still shuts down. I decided to attach a video showing how it happens. If it’s not too much trouble, please take a look at it.

VIDEO HERE

Here’s the output of lsmod | grep zenpower:

zenpower               20480  0

Here’s the grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash udev.log_priority=3 amd_pstate=active"

Here are the sensors:

amdgpu-pci-0400
Adapter: PCI adapter
vddgfx:      974.00 mV 
vddnb:       874.00 mV 
edge:         +37.0°C  
PPT:          14.00 W  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.06 V  
SVI2_SoC:    875.00 mV 
Tdie:         +38.9°C  (high = +95.0°C)
Tctl:         +38.9°C  
SVI2_P_Core:  15.91 W  
SVI2_P_SoC:    4.38 W  
SVI2_C_Core:  12.52 A  
SVI2_C_SoC:    5.00 A  

BAT0-acpi-0
Adapter: ACPI interface
in0:          11.85 V  

asus-isa-0000
Adapter: ISA adapter
cpu_fan:     1900 RPM

nvme-pci-0300
Adapter: PCI adapter
Composite:    +29.9°C  (low  =  -0.1°C, high = +76.8°C)
                       (crit = +79.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +38.0°C  (crit = +103.0°C)

That is horrible. Just don’t do it too often. It breaks your CPU after time.

When I compare:

Then it looks like that the fan stays at 1900 RPM all the time. Pretty much there is only one fan for all.

You can install the zenpower driver, but there is still nothing what controls the fan as it looks like it stays on 1900 RPM.

Maybe try this: GitHub - marazmista/radeon-profile: Application to read current clocks of ATi Radeon cards (xf86-video-ati, xf86-video-amdgpu)

pamac build radeon-profile

It has a GUI.

Now that is really unusable machine. I actually do not see the cpu cores in the sensor output, but htop sees them. I wonder if it is a sign of some issue and related. Here is how it looks on a budget intel.

Summary
[teo@teo-lenovo-v15 ~]$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +39.0°C  (high = +105.0°C, crit = +105.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +25.9°C  (low  = -273.1°C, high = +75.8°C)
                       (crit = +84.8°C)
Sensor 1:     +25.9°C  (low  = -273.1°C, high = +65261.8°C)

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:           7.97 V 

If something is so broken, i would also not completely rule out a hardware problem too. It is worth testing with some other live usbs, like an ubuntu and a hirensPE. If it is in the windows too than it is probably not software…

I tried running tests on Windows 10 with AIDA64. Within two minutes, I ran them and came to the conclusion that Windows seems to incorrectly detect the fans. They seemed to be spinning fast, but AIDA was showing only 500-700 RPM (I’m not sure about the accuracy). However, the laptop did not shut down, and the processor was running at around 3300-3500 MHz and 90 degrees Celsius plus or minus. I’ll attach a screenshot below.
@megavolt Thanks a lot for your help, but I don’t think it will help much.
@megavolt @Teo @cscs Thanks to all of you who are trying to help me with this issue! Honestly, it’s the only community that is making any effort to help, and I’m surprised. I even reached out to the Ubuntu forum, and no one replied to me.

What do you think, how much performance will I lose if I disable turbo boost? I didn’t want to do this until the last moment, but it seems there are no other options.

If that saves it, it is worth it. Turbo is not thought to run constantly anyway. There are a lot of slim ultrabooks with almost no thermal management, where the turbo is throttled after only a minute.

If that is an old laptop, i would also service (clean) the fan and renew the thermal compound. That can make a huge difference.

After long days of suffering and searching for a solution, I was able to find a configuration where my laptop wouldn’t overheat. Thanks to everyone who participated and helped in this thread, I’m truly grateful to you!

If you’ve encountered this issue, here’s what was done:

  1. Install tlp (below I will attach the configuration).
  2. If you have k10temp disabled (lsmod | grep k10temp), then enable it (sudo modprobe k10temp).
  3. Add the following line to the grub configuration: amd_pstate=passive. Update grub.

After this configuration, my processor frequencies stayed around 3900-3700 MHz ±. The temperature at its peak was 93 degrees, I saw such numbers once. Mostly, it ranged between 86-90.

works on: 6.5.3-1-MANJARO

CPU_DRIVER_OPMODE_ON_AC=passive
CPU_DRIVER_OPMODE_ON_BAT=passive
CPU_SCALING_GOVERNOR_ON_AC=ondemand
CPU_SCALING_GOVERNOR_ON_BAT=ondemand
CPU_BOOST_ON_AC=1
CPU_BOOST_ON_BAT=0
CPU_SCALING_MIN_FREQ_ON_AC=400000
CPU_SCALING_MAX_FREQ_ON_AC=3700000
CPU_SCALING_MIN_FREQ_ON_BAT=400000
CPU_SCALING_MAX_FREQ_ON_BAT=2900000
CPU_ENERGY_PERF_POLICY_ON_AC=balance_performance
CPU_ENERGY_PERF_POLICY_ON_BAT=balance_power

Just to be clear this is the opposite of what is needed for zenpower … but if it works for you then great.
k10temp should also be running by default.

So are you now dictating which frequencies to use and ignore the fan speed? Then something is missing in your configuration.

You need also:

CPU_SCALING_MIN_FREQ_ON_AC=0
CPU_SCALING_MAX_FREQ_ON_AC=3700
CPU_SCALING_MIN_FREQ_ON_BAT=0
CPU_SCALING_MAX_FREQ_ON_BAT=3700
CPU_ENERGY_PERF_POLICY_ON_AC=balance_performance
CPU_ENERGY_PERF_POLICY_ON_BAT=balance_power

Sorry, but this still too hot in my opinion. A maximum of 70 degrees under full load is okay.

That’s only true for older cpus. Ryzens are designed to boost up to 90-95 depending on exact model. The latest Intel cpus can hit 100 before thermal throttling.

So those Windows temps are normal for a Ryzen laptop under heavy load. The issue is why the same thermal throttling isn’t kicking in on Linux to prevent the temp going even higher until it hits the critical shutdown temp.

Alright. Modern laptops can officially be used for frying eggs or as a replacement for a kettle. :white_check_mark:

I think its because the predefined values for “automatic” are made for Desktop CPU’s. I mean, maybe the maximum temperature is not set? Note that usually the UEFI firmware controls thermal throttling, here it must be controlled by the OS, but Linux has no service, values for it. Windows has it. ASUS has drivers for it.

You still didn’t say anything about the fans. Are they constantly the same speed on Linux or did it ramp up?