AMD P-State EPP Scaling Driver on AMD Ryzen 7 5800H laptop

Nyakov13 · 26 September 2023 21:12

Ok, I think I got important finding’s, even crucial for me, and not sure where to put it.
Also, if someone have an idea where else I can address it, please let me know.

Probably this related to abrupt random shutdowns (power off) without any load, and without meaningful logs anywhere.

Kernel: 6.5.3-1-MANJARO
Hardware:
Model: HP Victus Laptop 16-e0xxx
CPU: AMD Ryzen 7 5800H
GPU: Radeon Vega Mobile and NVIDIA GeForce RTX 3060 6GB

The finding’s:

Kernel automatically loads AMD P-State EPP scaling (more info) driver (amd_pstate_epp).
By default, driver use “powersave” cpufreq governor (it is right thing to do, it allows performances hints from the OS, basically “performance” governor is useless for most of the cases)
By default, driver use “performance” hint (energy_performance_preference). And this leads to extremely high voltage on CPU(integrated GPU?) (vddgfx) in all times, as well as power consumption. 1.46V voltage and 16W power in totally idle state.
Setting energy_performance_preference to “balance_performance” leads to immediate drop of voltage to 1V and power draw to 5W
Setting energy_performance_preference to “power” drops voltage to 0.8V level and power draw to 3-4W.

My assumptions is, that first of all, it is extremely unhealthy to put (presumably integrated GPU?) to constant high voltage. And second, that under emerging load this causes immediate high power draw and quick chip heating without cooling system even active, and probably thermal shutdown of the system.

sensors output for default “performance” mode:

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.46 V  
vddnb:       949.00 mV 
edge:         +43.0°C  
PPT:          15.00 W

sensors output for “balance_performance” mode:

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:      999.00 mV 
vddnb:       949.00 mV 
edge:         +43.0°C  
PPT:           4.00 W

Important!
power-profiles-daemon at the time of writing cannot use AMD P-State EPP amd_pstate_epp and platform_profile drivers at the same time, and, because platform_profile is available in system it will ignore amd_pstate_epp completely, leaving it in default values (“powersave” mode and “perfomance” energy preference)

To see in what mode now power-profiles-daemon, use powerprofilesctl command.
Related power-profiles-daemon issue.

Also, there is auto-epp python script/systemd service for automatically manage energy performance preferences depending on power source of laptop (AC or Battery)

You can set energy_performance_preference by doing:

# echo "balance_performance" | tee /sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference

You can query available profiles by doing:

cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences

To see current cpufreq state and temps:

# sensors
# cpupower frequency-info

Addition:
Ok. I installed zenpower3, and this is what I found.
On default “performance” preference I got constant 1.46V on CPU and integrated GPU.
This is BAD.

$ sensors
hp-isa-0000
Adapter: ISA adapter
fan1:           0 RPM
fan2:           0 RPM

nvme-pci-0500
Adapter: PCI adapter
Composite:    +37.9°C  (low  = -273.1°C, high = +80.8°C)
                       (crit = +81.8°C)
Sensor 1:     +37.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.81 V  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.46 V  
SVI2_SoC:    950.00 mV 
Tdie:         +41.6°C  (high = +95.0°C)
Tctl:         +41.6°C  
SVI2_P_Core:   9.64 W  
SVI2_P_SoC:    3.63 W  
SVI2_C_Core:   7.25 A  
SVI2_C_SoC:    3.83 A  

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.46 V  
vddnb:       949.00 mV 
edge:         +41.0°C  
PPT:           8.00 W  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +41.0°C  (crit = +255.0°C)

On “balance_performance” I got dynamic voltage scaling and around 1V on CPU and GPU in idle state.

$ sensors
hp-isa-0000
Adapter: ISA adapter
fan1:           0 RPM
fan2:           0 RPM

nvme-pci-0500
Adapter: PCI adapter
Composite:    +36.9°C  (low  = -273.1°C, high = +80.8°C)
                       (crit = +81.8°C)
Sensor 1:     +36.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.81 V  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   1000.00 mV 
SVI2_SoC:    950.00 mV 
Tdie:         +43.1°C  (high = +95.0°C)
Tctl:         +43.1°C  
SVI2_P_Core:   3.27 W  
SVI2_P_SoC:    1.96 W  
SVI2_C_Core:   3.29 A  
SVI2_C_SoC:    2.06 A  

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:      999.00 mV 
vddnb:       949.00 mV 
edge:         +40.0°C  
PPT:           4.00 W  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +43.0°C  (crit = +255.0°C)

Useful links:

cscs · 26 September 2023 21:23

You might be interested in this thread:

Nyakov13 · 27 September 2023 13:07

Thank you for the info.
Although I do not want use TLP fro now because it is not the default and I am using KDE that have settings for power-profiles-daemon.
Need to look to zenpower3 probably… Not really want to install dkms…

stephane · 27 September 2023 18:21

so you can test

amd-pstate=passive
amd-pstate=active
amd-pstate=guided

check with :
cpupower frequency-info and
sudo turbostats

in my case ( desktop + 5600x ) i use
“iommu=pt amd-pstate=passive nowatchdog processor.max_cstate=5 systemd.unified_cgroup_hierarchy=true scsi_mod.use_blk_mq=1”

which drivers videos are you using ?

Nyakov13 · 28 September 2023 10:55

Ok. I installed zenpower3, and this is what I found.
On default “performance” preference I got constant 1.46V on CPU and integrated GPU.
This is BAD.

$ sensors
hp-isa-0000
Adapter: ISA adapter
fan1:           0 RPM
fan2:           0 RPM

nvme-pci-0500
Adapter: PCI adapter
Composite:    +37.9°C  (low  = -273.1°C, high = +80.8°C)
                       (crit = +81.8°C)
Sensor 1:     +37.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.81 V  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.46 V  
SVI2_SoC:    950.00 mV 
Tdie:         +41.6°C  (high = +95.0°C)
Tctl:         +41.6°C  
SVI2_P_Core:   9.64 W  
SVI2_P_SoC:    3.63 W  
SVI2_C_Core:   7.25 A  
SVI2_C_SoC:    3.83 A  

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.46 V  
vddnb:       949.00 mV 
edge:         +41.0°C  
PPT:           8.00 W  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +41.0°C  (crit = +255.0°C)

On “balance_performance” I got dynamic voltage scaling and around 1V on CPU and GPU in idle state.

$ sensors
hp-isa-0000
Adapter: ISA adapter
fan1:           0 RPM
fan2:           0 RPM

nvme-pci-0500
Adapter: PCI adapter
Composite:    +36.9°C  (low  = -273.1°C, high = +80.8°C)
                       (crit = +81.8°C)
Sensor 1:     +36.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.81 V  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   1000.00 mV 
SVI2_SoC:    950.00 mV 
Tdie:         +43.1°C  (high = +95.0°C)
Tctl:         +43.1°C  
SVI2_P_Core:   3.27 W  
SVI2_P_SoC:    1.96 W  
SVI2_C_Core:   3.29 A  
SVI2_C_SoC:    2.06 A  

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:      999.00 mV 
vddnb:       949.00 mV 
edge:         +40.0°C  
PPT:           4.00 W  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +43.0°C  (crit = +255.0°C)

Also. No random shutdowns till this moment!

ling · 13 March 2024 03:42

After reading these posts, I’m a bit confused. Is your solution simply to adjust the system’s power mode to balanced?

I can describe in detail the issue I encountered:
Initially, when I first installed Manjaro, my computer would freeze under low load and required a long press of the power button to restart (similar to your situation). Later, I tried limiting cstate to 5 (because someone mentioned that AMD c6 could cause this issue), as well as trying other Linux boot parameters, but none of them worked. My final solution was to open a virtual machine in VirtualBox to prevent the computer from entering a low-power state.

This solution has been effective all along, and this month I noticed that even if I don’t open the virtual machine, the freezing issue has disappeared. This is quite surprising because I didn’t spend any more time on this issue. Based on your post, I believe that after a system update, my power management was able to adjust to balanced mode normally, as I always set it to default to balanced upon startup.

Until last week, when I upgraded to Linux kernel 66 (because 65 is no longer maintained), I found that the computer would randomly restart without any warning. Please note that it’s not freezing, but an automatic restart.

After this issue occurred, I tried to find a solution but found no useful information. Therefore, I had to downgrade the Linux kernel to 61 (I also tried booting into 65, but it still caused restarts, possibly because some drivers were also upgraded during the 65 upgrade).

Kernel 61 worked fine without any problems.

Just now, following your post, I set the power mode to power-saver, and shortly afterward, the issue reappeared: automatic restarts.

So, I’m glad to have found the cause of the random restarts last week, and I appreciate your work here!

However, I hope to find a perfect solution: the computer should work normally in any mode. What should I do? Thanks everyone for your answers!

cscs · 13 March 2024 04:51

We were discussing the AMD P-State driver … which can control your available ‘power modes’ and when they are switched.

This can work in conjunction with power management software like TLP or power-profiles-daemon.

As far as I can tell from the thread … OP did not mark any solution.

That isnt suggested here.

Sounds like you should get a handle on your systems configuration.

It also sounds like maybe you should not be using power-profiles-daemon and/or whatever scaling driver you are using.

This is all highly dependent on your actual hardware (and use case).

ling · 13 March 2024 06:35

Oh, Perhaps my expression was misunderstood. What I meant was that doing so caused the issue to reoccur for me. However, afterward, I still set the power management to balanced to ensure the system runs normally.

Thank you for your suggestion.

Tanker1 · 10 April 2024 05:56

I found a simple solution to keep stability of manjaro to avoid the impact of c6 state on my computer. The similar problem didn’t occur after I’m running command ‘/usr/bin/stress --cpu 1’ to prevent cpu usage from lower level.

cscs · 10 April 2024 06:08

Wait, so you ran stress on ‘cpu 1’ and then everything is fine?
Or you are constantly stressing ‘cpu 1’?

Is this really preferable to the outlined c6 workarounds?
https://wiki.archlinux.org/title/Ryzen#Soft_lock_freezing

And is this really the same issue as OP here?

Tanker1 · 10 April 2024 12:49

Constantly stressing cpu. And if i stopped stressing, it freezed again. I’ve disabled c6 state and processor.max_cstate=1 according to the link you posted but it didn’t work. By the way, my UEFI didn’t have an option “Power idle control”.

cscs · 10 April 2024 23:28

Link does not suggest that.

And was it properly applied? Like you updated grub afterwards?

The link also offers secondary advice in the case it somehow does not work.

Finally in the worst case scenarios…

But also note none of that is explicitly linked to the scaling driver.

I still fail to see how your system freezing (possibly by c state) is related to this thread.

Tanker1 · 11 April 2024 11:34

Sorry for jumbling of this original situation of this thread and my circumstance. Here is the grub configuration ‘GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash udev.log_priority=3 idle=nomwait processor.max_cstate=1 intel_idle.max_cstate=0”’ I tried. It differs from yours so I’ll applied yours properly next time. My disable-c6 configuration is consistent with the link.
Unfortunately, the solutions of the link didn’t work absolutely unless substituting the amd with intel or stressing the cpu constantly, after I implemented what you said.

ling · 12 April 2024 02:05

I have a better method to prevent the CPU from entering low-power mode: Install VirtualBox and run a virtual machine in the background, such as Windows 11, since there are still some applications that require Windows. This way, the CPU will remain active without wasting resources like the ‘stress’ utility would. I have done this, and it’s useful.

Tanker1 · 15 April 2024 00:50

Thanks for your advice. I’m trying.