As soon as any setting changed in /sys/class/drm/card0/device/pp_od_clk_voltage
the problem occure after a suspend and resume from S3:
$ cat /sys/module/amdgpu/parameters/ppfeaturemask
0xffffffff
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2564Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK: 500Mhz 2800Mhz
MCLK: 674Mhz 1075Mhz
Setting the Engine clock limit from 2564MHz to 2400MHz:
$ echo s 1 2400 > /sys/class/drm/card0/device/pp_od_clk_voltage
$ echo c > /sys/class/drm/card0/device/pp_od_clk_voltage
Same with kernel 5.15 and also reseting pp_od_clk_voltage to defaults
before suspend resume does not help. As soon as setting OD_SCLK/OD_MCLK/OD_VDDGFX_OFFSET
then does trigger the crash when starting Unigine Heaven Benchmark.
Setting (only) an PowerLimit works fine and does not trigger a freeze after suspend and resume:
$ echo $((170*1000**2)) > /sys/class/drm/card0/device/hwmon/hwmon?/power1_cap
Kernel messages of a resume from S3 with crashing Unigine Heaven Benchmark:
10:56:00 kernel: [drm] reserve 0xa00000 from 0x800f400000 for PSP TMR
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: RAS: optional ras ta ucode is not available
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: SMU is resuming...
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: smu driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw version = 0x00412e00 (65.46.0)
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: SMU driver if version not matched
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: SMU is resumed successfully!
10:56:00 kernel: [drm] DMUB hardware initialized: version=0x02020007
10:56:00 kernel: [drm] kiq ring mec 2 pipe 1 q 0
10:56:00 kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
10:56:00 kernel: [drm] JPEG decode initialized successfully.
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
10:56:00 kernel: amdgpu 0000:2f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
...
Starting Unigine Haven Benchmark (Preset: Extreme) ...
...
10:57:21 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
10:57:21 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1576992, emitted seq=1576994
10:57:21 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process heaven_x64 pid 10123 thread heaven_x64:cs0 pid 10152
10:57:21 kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
10:57:25 kernel: amdgpu 0000:2f:00.0: amdgpu: failed to suspend display audio
10:57:25 kernel: amdgpu 0000:2f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
10:57:25 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
10:57:25 kernel: amdgpu 0000:2f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
10:57:25 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
10:57:26 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
10:57:26 kernel: [drm] free PSP TMR buffer
10:57:26 kernel: amdgpu 0000:2f:00.0: amdgpu: MODE1 reset
10:57:26 kernel: amdgpu 0000:2f:00.0: amdgpu: GPU mode1 reset
10:57:26 kernel: amdgpu 0000:2f:00.0: amdgpu: GPU smu mode1 reset
...
10:57:26 kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset succeeded, trying to resume
10:57:26 kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
10:57:26 kernel: [drm] VRAM is lost due to GPU reset!
10:57:26 kernel: [drm] PSP is resuming...
10:57:27 kernel: [drm] reserve 0xa00000 from 0x8267400000 for PSP TMRkernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: RAS: optional ras ta ucode is not availablekernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1576992, emitted seq=1576994
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not availablekernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process heaven_x64 pid 10123 thread heaven_x64:cs0 pid 10152
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: SMU is resuming...kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: smu driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw version = 0x00412e00 (65.46.0)kernel: amdgpu 0000:2f:00.0: amdgpu: failed to suspend display audio
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: SMU driver if version not matchedkernel: amdgpu 0000:2f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: SMU is resumed successfully!kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
10:57:27 kernel: [drm] DMUB hardware initialized: version=0x02020007kernel: amdgpu 0000:2f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
10:57:27 kernel: [drm] kiq ring mec 2 pipe 1 q 0kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
10:57:27 kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
10:57:27 kernel: [drm] JPEG decode initialized successfully.kernel: [drm] free PSP TMR buffer
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0kernel: amdgpu 0000:2f:00.0: amdgpu: MODE1 reset
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0kernel: amdgpu 0000:2f:00.0: amdgpu: GPU mode1 reset
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0kernel: amdgpu 0000:2f:00.0: amdgpu: GPU smu mode1 reset
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset(2) succeeded!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: recover vram bo from shadow start
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: recover vram bo from shadow done
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: [drm] Skip scheduling IBs!
10:57:27 kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset(2) succeeded!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:27 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:32 kernel: amdgpu_cs_ioctl: 7288 callbacks suppressed
10:57:32 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:32 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
10:57:32 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
...
Kernel messages without overclocking setting the successfull resume logs amdgpu kernel messages:
Feb 05 14:29:38 gamerig kernel: [drm] reserve 0xa00000 from 0x800f400000 for PSP TMR
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: SMU is resuming...
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: smu driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw version = 0x00412e00 (65.46.0)
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: SMU driver if version not matched
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: SMU is resumed successfully!
Feb 05 14:29:38 gamerig kernel: [drm] DMUB hardware initialized: version=0x02020007
Feb 05 14:29:38 gamerig kernel: [drm] kiq ring mec 2 pipe 1 q 0
Feb 05 14:29:38 gamerig kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Feb 05 14:29:38 gamerig kernel: [drm] JPEG decode initialized successfully.
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Feb 05 14:29:38 gamerig kernel: amdgpu 0000:2f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
...
Starting Unigine Haven Benchmark (Preset: Extreme) ... no errors/amdgpu messages logged ...
Hardware/OS:
System:
Host: gamerig Kernel: 5.16.5-1-MANJARO x86_64 bits: 64 compiler: gcc
v: 11.1.0 Console: pty pts/2 Distro: Manjaro Linux base: Arch Linux
Machine:
Type: Desktop System: Micro-Star product: MS-7D54 v: 1.0 serial: N/A
Mobo: Micro-Star model: MAG X570S TOMAHAWK MAX WIFI (MS-7D54) v: 1.0
serial: 07D5411_L61E799187 UEFI: American Megatrends LLC. v: 1.10
date: 12/17/2021
Memory:
RAM: total: 31.33 GiB used: 2.98 GiB (9.5%)
Array-1: capacity: 128 GiB slots: 4 EC: None max-module-size: 32 GiB
note: est.
Device-1: DIMM 0 size: No Module Installed
Device-2: DIMM 1 size: 16 GiB speed: 3200 MT/s type: DDR4
Device-3: DIMM 0 size: No Module Installed
Device-4: DIMM 1 size: 16 GiB speed: 3200 MT/s type: DDR4
CPU:
Info: 6-core model: AMD Ryzen 5 5600X bits: 64 type: MT MCP arch: Zen 3
rev: 0 cache: L1: 384 KiB L2: 3 MiB L3: 32 MiB
Speed (MHz): avg: 3700 min/max: 2200/4650 boost: enabled cores: 1: 3700
2: 3700 3: 3700 4: 3700 5: 3700 6: 3700 7: 3700 8: 3700 9: 3700 10: 3700
11: 3700 12: 3700 bogomips: 88842
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT / 6800M]
vendor: Sapphire Limited driver: amdgpu v: kernel bus-ID: 2f:00.0
Display: server: X.Org 1.21.1.3 driver: loaded: amdgpu,ati
unloaded: modesetting,radeon resolution: 3840x1600~75Hz
Message: Unable to show advanced data. Required tool glxinfo missing.