Overheat or CPU Frequency Management Issues

You are missing the headers for your kernel;

sudo pacman -Syu linux65-headers

As to zenpower in general please see the project page:

(there are further steps such as unloading k10temp)

1 Like

I’ve installed Zenpower3. Now it’s recognized in the sensors, but the laptop still shuts down. I decided to attach a video showing how it happens. If it’s not too much trouble, please take a look at it.

VIDEO HERE

Here’s the output of lsmod | grep zenpower:

zenpower               20480  0

Here’s the grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash udev.log_priority=3 amd_pstate=active"

Here are the sensors:

amdgpu-pci-0400
Adapter: PCI adapter
vddgfx:      974.00 mV 
vddnb:       874.00 mV 
edge:         +37.0°C  
PPT:          14.00 W  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.06 V  
SVI2_SoC:    875.00 mV 
Tdie:         +38.9°C  (high = +95.0°C)
Tctl:         +38.9°C  
SVI2_P_Core:  15.91 W  
SVI2_P_SoC:    4.38 W  
SVI2_C_Core:  12.52 A  
SVI2_C_SoC:    5.00 A  

BAT0-acpi-0
Adapter: ACPI interface
in0:          11.85 V  

asus-isa-0000
Adapter: ISA adapter
cpu_fan:     1900 RPM

nvme-pci-0300
Adapter: PCI adapter
Composite:    +29.9°C  (low  =  -0.1°C, high = +76.8°C)
                       (crit = +79.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +38.0°C  (crit = +103.0°C)

That is horrible. Just don’t do it too often. It breaks your CPU after time.

When I compare:

Then it looks like that the fan stays at 1900 RPM all the time. Pretty much there is only one fan for all.

You can install the zenpower driver, but there is still nothing what controls the fan as it looks like it stays on 1900 RPM.

Maybe try this: GitHub - marazmista/radeon-profile: Application to read current clocks of ATi Radeon cards (xf86-video-ati, xf86-video-amdgpu)

pamac build radeon-profile

It has a GUI.

Now that is really unusable machine. I actually do not see the cpu cores in the sensor output, but htop sees them. I wonder if it is a sign of some issue and related. Here is how it looks on a budget intel.

Summary
[teo@teo-lenovo-v15 ~]$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +39.0°C  (high = +105.0°C, crit = +105.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +25.9°C  (low  = -273.1°C, high = +75.8°C)
                       (crit = +84.8°C)
Sensor 1:     +25.9°C  (low  = -273.1°C, high = +65261.8°C)

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:           7.97 V 

If something is so broken, i would also not completely rule out a hardware problem too. It is worth testing with some other live usbs, like an ubuntu and a hirensPE. If it is in the windows too than it is probably not software…

I tried running tests on Windows 10 with AIDA64. Within two minutes, I ran them and came to the conclusion that Windows seems to incorrectly detect the fans. They seemed to be spinning fast, but AIDA was showing only 500-700 RPM (I’m not sure about the accuracy). However, the laptop did not shut down, and the processor was running at around 3300-3500 MHz and 90 degrees Celsius plus or minus. I’ll attach a screenshot below.
@megavolt Thanks a lot for your help, but I don’t think it will help much.
@megavolt @Teo @cscs Thanks to all of you who are trying to help me with this issue! Honestly, it’s the only community that is making any effort to help, and I’m surprised. I even reached out to the Ubuntu forum, and no one replied to me.

What do you think, how much performance will I lose if I disable turbo boost? I didn’t want to do this until the last moment, but it seems there are no other options.

If that saves it, it is worth it. Turbo is not thought to run constantly anyway. There are a lot of slim ultrabooks with almost no thermal management, where the turbo is throttled after only a minute.

If that is an old laptop, i would also service (clean) the fan and renew the thermal compound. That can make a huge difference.

After long days of suffering and searching for a solution, I was able to find a configuration where my laptop wouldn’t overheat. Thanks to everyone who participated and helped in this thread, I’m truly grateful to you!

If you’ve encountered this issue, here’s what was done:

  1. Install tlp (below I will attach the configuration).
  2. If you have k10temp disabled (lsmod | grep k10temp), then enable it (sudo modprobe k10temp).
  3. Add the following line to the grub configuration: amd_pstate=passive. Update grub.

After this configuration, my processor frequencies stayed around 3900-3700 MHz ±. The temperature at its peak was 93 degrees, I saw such numbers once. Mostly, it ranged between 86-90.

works on: 6.5.3-1-MANJARO

CPU_DRIVER_OPMODE_ON_AC=passive
CPU_DRIVER_OPMODE_ON_BAT=passive
CPU_SCALING_GOVERNOR_ON_AC=ondemand
CPU_SCALING_GOVERNOR_ON_BAT=ondemand
CPU_BOOST_ON_AC=1
CPU_BOOST_ON_BAT=0
CPU_SCALING_MIN_FREQ_ON_AC=400000
CPU_SCALING_MAX_FREQ_ON_AC=3700000
CPU_SCALING_MIN_FREQ_ON_BAT=400000
CPU_SCALING_MAX_FREQ_ON_BAT=2900000
CPU_ENERGY_PERF_POLICY_ON_AC=balance_performance
CPU_ENERGY_PERF_POLICY_ON_BAT=balance_power

Just to be clear this is the opposite of what is needed for zenpower … but if it works for you then great.
k10temp should also be running by default.

So are you now dictating which frequencies to use and ignore the fan speed? Then something is missing in your configuration.

You need also:

CPU_SCALING_MIN_FREQ_ON_AC=0
CPU_SCALING_MAX_FREQ_ON_AC=3700
CPU_SCALING_MIN_FREQ_ON_BAT=0
CPU_SCALING_MAX_FREQ_ON_BAT=3700
CPU_ENERGY_PERF_POLICY_ON_AC=balance_performance
CPU_ENERGY_PERF_POLICY_ON_BAT=balance_power

Sorry, but this still too hot in my opinion. A maximum of 70 degrees under full load is okay.

That’s only true for older cpus. Ryzens are designed to boost up to 90-95 depending on exact model. The latest Intel cpus can hit 100 before thermal throttling.

So those Windows temps are normal for a Ryzen laptop under heavy load. The issue is why the same thermal throttling isn’t kicking in on Linux to prevent the temp going even higher until it hits the critical shutdown temp.

Alright. Modern laptops can officially be used for frying eggs or as a replacement for a kettle. :white_check_mark:

I think its because the predefined values for “automatic” are made for Desktop CPU’s. I mean, maybe the maximum temperature is not set? Note that usually the UEFI firmware controls thermal throttling, here it must be controlled by the OS, but Linux has no service, values for it. Windows has it. ASUS has drivers for it.

You still didn’t say anything about the fans. Are they constantly the same speed on Linux or did it ramp up?

That’s what I thought. On my desktop PC I can control all of that from the UEFI BIOS. I can even set power limits (e.g. I can run my 5800X in 65W eco-mode instead of the default 105W TDP, which reduces temps a lot at the cost of performance).

There is something strange going on here. The first thing I’d try is turning off amd-pstate and see if the same thing happens with acpi_cpufreq.

At Zenpower, my fans didn’t want to increase their speed, even though the sensors detected the temperature with Zenpower (as seen in the response above). That’s why I switched back to k10temp, and it does increase the fan speed at least a bit, so the choice was made in favor of it.

You’re right, I’ve set up this configuration, and the temperature stays around 75, with occasional spikes up to 90, but still, it’s better than disabling turbo boost (in my opinion).

Lmao :smiley:

Hello! Thank you! I tried it on Ubuntu 22.04; I tested the system with two cores, the first one was around 6.1.* something, precisely with acpi_cpufreq. Nevertheless, the laptop kept shutting down. Then, I switched to 6.5.1 with amd_pstate, and it was the same issue. I rolled back to Manjaro, which I prefer, and I managed to configure it to a working state like this.

In my opinion it looks like a general thermal bug if the kernel is really able to adjust the fan speed, at least slightly.

Probably inform the kernel developers: https://bugzilla.kernel.org/describecomponents.cgi?product=Power%20Management

Currently, I guess, the default values for throttling are meant to use in the server space, where cooling is not a problem.

This topic was automatically closed 3 hours after the last reply. New replies are no longer allowed.