After each boot, the system will unexpectedly restart after a period of time

I upgraded to Linux kernel 66, and now my computer automatically restarts every 10 minutes. I couldn’t find any exceptions in the logs. I suspect that there are no corresponding logs recorded during the crashes (similar to the inexplicable freezing issue I encountered before, with no logs recorded). Is there any way to increase the logging level of the system, or are there other methods to solve this problem? I’ve switched back to my old kernel, Linux 65, and now it runs for more than 10 minutes without any exceptions. I can provide any necessary information immediately. Thank you for your help!

# last -x|head|tac                                                                                                                                               
ling     pts/1        :0               Thu Mar  7 19:47 - 19:53  (00:05)
reboot   system boot  6.6.19-1-MANJARO Thu Mar  7 20:07   still running
ling     tty2         :0               Thu Mar  7 20:07 - crash  (00:10)
ling     pts/0        :0               Thu Mar  7 20:07 - crash  (00:10)
ling     pts/1        :0               Thu Mar  7 20:07 - crash  (00:10)
reboot   system boot  6.5.13-7-MANJARO Thu Mar  7 20:18   still running
ling     tty2         :0               Thu Mar  7 20:18   still logged in
ling     pts/0        :0               Thu Mar  7 20:18   still logged in
ling     pts/1        :0               Thu Mar  7 20:18 - 20:19  (00:00)
ling     pts/1        :0               Thu Mar  7 20:19   still logged in

and now is Thu Mar 7 20:37

Oh, This issue may not be related to the kernel upgrade, as I discovered that the problem existed even before the kernel was upgraded this morning.

shutdown system down  6.5.13-7-MANJARO Wed Mar  6 07:38 - 08:23  (00:44)
reboot   system boot  6.5.13-7-MANJARO Wed Mar  6 08:23 - 11:09 (1+02:46)
ling     tty2         :0               Wed Mar  6 08:23 - crash (1+00:28)
ling     pts/0        :0               Wed Mar  6 08:24 - crash (1+00:27)
ling     pts/1        :0               Wed Mar  6 08:43 - crash (1+00:08)
ling     pts/2        :0               Wed Mar  6 10:25 - crash  (22:26)
ling     pts/3        :0               Wed Mar  6 10:47 - crash  (22:04)
ling     pts/5        :0               Wed Mar  6 10:53 - crash  (21:58)
reboot   system boot  6.5.13-7-MANJARO Thu Mar  7 08:51 - 11:09  (02:17)
ling     tty2         :0               Thu Mar  7 08:52 - crash  (01:53)
ling     pts/0        :0               Thu Mar  7 08:52 - crash  (01:53)
ling     pts/1        :0               Thu Mar  7 09:00 - 09:17  (00:16)
ling     pts/1        :0               Thu Mar  7 09:20 - 09:24  (00:03)
ling     pts/1        :0               Thu Mar  7 10:07 - crash  (00:38)

I enabled debug mode logging through this post linux开启debug日志. Just now, there was a reboot. This time, the system froze first, but the mouse could move for a few seconds, then it rebooted. I checked the logs and found that the WiFi driver had an error before and after the reboot.

3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x000152DA | data1
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x004EEC2A | interruptlink2
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x004EEC2A | interruptlink1
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x004F8B7A | branchlink2
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x00000000 | trm_hw_status1
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x0000A200 | trm_hw_status0
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN       
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: Loaded firmware version: 77.ad46c98b.0 cc-a0-77.ucode
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: Transport status: 0x0000004A, valid: 6
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: Current CMD queue read_ptr 175 write_ptr 176
3月 07 21:01:31 ling-20ym kernel: iwlwifi 0000:02:00.0: Error sending SCAN_CFG_CMD: time out after 2000ms.

After just restarting, no wifi errors were found, and the last-minute logs were not saved. Now I don’t know how to find the problem.

can you report

inxi -Fza
sudo mhwd-kernel -li

Yes, But now, I have downgraded the kernel version from 66 to 61, and this version has no issues. It has been running stable for about a week. Here is the report for now:

$ inxi -Fza
  Kernel: 6.1.80-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: hpet avail: acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.1-x86_64
    root=UUID=c88f5257-3476-44bc-8d68-fde2b5aceff8 rw idle=nomwait
    processor.max_cstate=1 intel_idle.max_cstate=0 quiet splash
  Desktop: KDE Plasma v: 5.27.10 tk: Qt v: 5.15.12 info: frameworks
    v: 5.115.0 wm: kwin_x11 vt: 2 dm: SDDM Distro: Manjaro base: Arch Linux
  Type: Laptop System: LENOVO product: 20YM v: Lenovo ThinkBook 16p Gen 2
    serial: <superuser required> Chassis: type: 10 v: Lenovo ThinkBook 16p Gen 2
    serial: <superuser required>
  Mobo: LENOVO model: LNVNB161216 v: SDK0L77769 WIN
    serial: <superuser required> part-nu: LENOVO_MT_20YM_BU_idea_FM_ThinkBook
    16p G2 ACH uuid: <superuser required> UEFI: LENOVO v: GXCN48WW
    date: 08/28/2023
  ID-1: BAT0 charge: 60.7 Wh (96.0%) condition: 63.2/71.0 Wh (89.0%)
    volts: 15.4 min: N/A model: SMP L20M4PD3 type: Li-poly serial: <filter>
    status: not charging cycles: 79
  Info: model: AMD Ryzen 7 5800H with Radeon Graphics bits: 64 type: MT MCP
    arch: Zen 3 gen: 4 level: v3 note: check built: 2021-22
    process: TSMC n7 (7nm) family: 0x19 (25) model-id: 0x50 (80) stepping: 0
    microcode: 0xA50000C
  Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
    L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
    L3: 16 MiB desc: 1x16 MiB
  Speed (MHz): avg: 3242 high: 4048 min/max: 1200/4462 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 4048 2: 4046
    3: 3987 4: 4038 5: 4046 6: 4048 7: 4048 8: 4046 9: 1200 10: 1200 11: 3817
    12: 1200 13: 3847 14: 3876 15: 3237 16: 1200 bogomips: 102242
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Type: gather_data_sampling status: Not affected
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow mitigation: safe RET, no microcode
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
  Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW,
    STIBP: always-on, RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
  Device-1: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] vendor: Lenovo
    driver: nvidia v: 470.239.06 alternate: nouveau,nvidia_drm non-free: 545.xx+
    status: current (as of 2024-02; EOL~2026-12-xx) arch: Ampere code: GAxxx
    process: TSMC n7 (7nm) built: 2020-2023 pcie: gen: 2 speed: 5 GT/s
    lanes: 8 link-max: gen: 4 speed: 16 GT/s lanes: 16 ports: active: none
    off: DP-4 empty: DP-3,eDP-2 bus-ID: 01:00.0 chip-ID: 10de:2520
    class-ID: 0300
  Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
    vendor: Lenovo driver: amdgpu v: kernel arch: GCN-5 code: Vega
    process: GF 14nm built: 2017-20 pcie: gen: 3 speed: 8 GT/s lanes: 16
    link-max: gen: 4 speed: 16 GT/s ports: active: eDP-1 empty: DP-1,DP-2
    bus-ID: 06:00.0 chip-ID: 1002:1638 class-ID: 0300 temp: 55.0 C
  Device-3: Chicony Integrated Camera driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-3:2 chip-ID: 04f2:b71f
    class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: 21.1.11 compositor: kwin_x11 driver: X:
    loaded: modesetting,nvidia dri: radeonsi gpu: amdgpu,nvidia,nvidia-nvswitch
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 4480x1600 s-dpi: 96 s-size: 1184x423mm (46.61x16.65")
    s-diag: 1257mm (49.5")
  Monitor-1: DP-4 mapped: DP-1-2 note: disabled pos: primary,right
    model: Philips PHL 243S7 serial: <filter> built: 2021 res: 1920x1080 hz: 60
    dpi: 93 gamma: 1.2 size: 527x296mm (20.75x11.65") diag: 604mm (23.8")
    ratio: 16:9 modes: max: 1920x1080 min: 640x480
  Monitor-2: eDP-1 pos: left model-id: CSO 0x1603 built: 2020 res: 2560x1600
    hz: 60 dpi: 189 gamma: 1.2 size: 344x215mm (13.54x8.46") diag: 406mm (16")
    ratio: 16:10 modes: max: 2560x1600 min: 640x480
  API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi platforms: device: 0
    drv: nvidia device: 2 drv: radeonsi device: 3 drv: swrast gbm:
    drv: kms_swrast surfaceless: drv: swrast x11: drv: radeonsi
    inactive: wayland,device-1
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.0.2-manjaro1.1
    glx-v: 1.4 direct-render: yes renderer: AMD Radeon Graphics (radeonsi
    renoir LLVM 16.0.6 DRM 3.49 6.1.80-1-MANJARO) device-ID: 1002:1638
    memory: 500 MiB unified: no
  API: Vulkan v: 1.3.276 layers: 5 device: 0 type: discrete-gpu name: NVIDIA
    GeForce RTX 3060 Laptop GPU driver: nvidia v: 470.239.06
    device-ID: 10de:2520 surfaces: xcb,xlib device: 1 type: integrated-gpu
    name: AMD Radeon Graphics (RADV RENOIR) driver: mesa radv
    v: 24.0.2-manjaro1.1 device-ID: 1002:1638 surfaces: xcb,xlib
  Device-1: NVIDIA GA106 High Definition Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 8
    link-max: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 01:00.1
    chip-ID: 10de:228e class-ID: 0403
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor vendor: Lenovo driver: N/A
    alternate: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x, snd_pci_acp6x,
    snd_acp_pci, snd_rpl_pci_acp6x, snd_pci_ps, snd_sof_amd_renoir,
    snd_sof_amd_rembrandt pcie: gen: 3 speed: 8 GT/s lanes: 16 link-max:
    gen: 4 speed: 16 GT/s bus-ID: 06:00.5 chip-ID: 1022:15e2 class-ID: 0480
  Device-3: AMD Family 17h/19h HD Audio vendor: Lenovo driver: snd_hda_intel
    v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4
    speed: 16 GT/s bus-ID: 06:00.6 chip-ID: 1022:15e3 class-ID: 0403
  API: ALSA v: k6.1.80-1-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: JACK v: 1.9.22 status: off tools: N/A
  Server-2: PipeWire v: 1.0.3 status: off with: pipewire-media-session
    status: active tools: pw-cli
  Server-3: PulseAudio v: 17.0 status: active with: pulseaudio-alsa
    type: plugin tools: pacat,pactl
  Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel pcie: gen: 2
    speed: 5 GT/s lanes: 1 bus-ID: 02:00.0 chip-ID: 8086:2723 class-ID: 0280
  IF: wlp2s0 state: up mac: <filter>
  IF-ID-1: docker0 state: down mac: <filter>
  IF-ID-2: vboxnet0 state: up speed: 10 Mbps duplex: full mac: <filter>
  Info: services: NetworkManager, sshd, systemd-timesyncd, wpa_supplicant
  Device-1: Intel AX200 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-3:3 chip-ID: 8087:0029
    class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 3 state: up address: see --recommends
  Local Storage: total: 1.38 TiB used: 931.56 GiB (66.1%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital model: PC SN730
    SDBPNTY-512G-1101 size: 476.94 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 11190001 temp: 27.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:6 vendor: Western Digital
    model: WD Blue SN570 1TB SSD size: 931.51 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 234100WD temp: 29.9 C scheme: GPT
  ID-1: / raw-size: 95.37 GiB size: 93.31 GiB (97.84%) used: 56.17 GiB (60.2%)
    fs: ext4 dev: /dev/nvme1n1p5 maj-min: 259:11
  ID-2: /boot/efi raw-size: 977 MiB size: 975.1 MiB (99.80%)
    used: 14.3 MiB (1.5%) fs: vfat dev: /dev/nvme1n1p4 maj-min: 259:10
  ID-3: /home raw-size: 392.79 GiB size: 385.55 GiB (98.16%)
    used: 306.26 GiB (79.4%) fs: ext4 dev: /dev/nvme1n1p6 maj-min: 259:12
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
    compressor: zstd max-pool: 20%
  ID-1: swap-1 type: partition size: 22.89 GiB used: 39.5 MiB (0.2%)
    priority: -2 dev: /dev/nvme1n1p3 maj-min: 259:9
  System Temperatures: cpu: 65.1 C mobo: N/A
  Fan Speeds (rpm): N/A
  Memory: total: 24 GiB note: est. available: 22.84 GiB
    used: 13.92 GiB (61.0%)
  Processes: 431 Power: uptime: 6m states: freeze,mem,disk suspend: s2idle
    wakeups: 0 hibernate: platform avail: shutdown, reboot, suspend, test_resume
    image: 9.12 GiB services: org_kde_powerdevil, power-profiles-daemon,
    upowerd Init: systemd v: 255 default: graphical tool: systemctl
  Packages: pm: dpkg pkgs: 0 pm: pacman pkgs: 1809 libs: 453
    tools: pamac,yay pm: flatpak pkgs: 0 Compilers: clang: 16.0.6 gcc: 13.2.1
    Shell: Zsh v: 5.9 default: Bash v: 5.2.26 running-in: konsole inxi: 3.3.33


$ sudo mhwd-kernel -li
Currently running: 6.1.80-1-MANJARO (linux61)
The following kernels are installed in your system:
   * linux61
   * linux65

6.5 is EOL as has been repeated many, many, many times. Remove it and install linux66 if you want a newer kernel.

you can choose kernel on grub menu

you can add

sudo mhwd-kernel -r linux65
sudo mhwd-kernel -i linux66
sudo mhwd-kernel -i linux67

about theses options , you can remove them
idle=nomwait processor.max_cstate=1 intel_idle.max_cstate=0

be careful linux67 may requiered nosplash if you have plymouth
quiet splash

also check for each kernel

cpupower frequency-info

it should be show amd-state
see this

Yes, so after I found the system unstable after upgrading to 66, I downgraded to 61. The reason 65 still exists on my computer is because I need to ensure that at least one reliable system is available.

Thank you very much for providing this information, it’s very helpful! I’ve just understood the post you mentioned and made an attempt:

When I adjusted the power scheme to power-saver through KDE’s graphical interface, the computer reproduced the freeze-shutdown-restart problem shortly thereafter.

However, I still have some questions about this issue:

  1. How can I determine if my computer is using amd_pstate to regulate CPU performance? Do I need to pass the amd-pstate=** parameter through boot options?

  2. What is the difference between TLP and power-profiles-daemon? My computer currently uses power-profiles-daemon, and after a certain version, my computer has not frozen even under low loads (before that, my computer would freeze under low loads, and my solution was to start a VirtualBox virtual machine to prevent the computer from being under particularly low load).

  3. Does the Linux 66 kernel cause the power scheme to be unadjustable? Because when the random restart problem occurred two weeks ago, I did not change the power scheme (it has always been in the balanced state after startup).

1 - you can modify on launch grub with (e)dit mode and use arrows ,
be careful it will be in qwerty , for add or modify parameter boot for tests

and check with

cpupower frequency-info
sudo journalctl -b0 

for update boot command with grub ,

sudo nano /etc/default/grub ( Ctrl+X for save ) 
sudo update-grub

2 - TLP is older , and there is a file parameter , power-profiles-daemon is for changing cpu mode , there is a lack on information for using this

3 - may be the change come with dbus-broker-unit , or tlp or power-profiles-daemon

see : CPU frequency scaling - ArchWiki

nb : if you see amd-pstate-epp , be careful there is 2 modes :
one concerns its power , 2nd is for cpu scheduler

Even just having it installed may cause problems whenever you update because linux65-nvidia is no longer available. I suggest installing the previous LTS kernel instead as a fallback (5.15 LTS).

Thanks for the suggestion, I have removed the 65 kernel.

It does seem to be a problem with the 66 kernel. I have reinstalled the 66 kernel today and passed the parameter amd_pstate during boot. Its value of passive or active will cause a restart. Guided has not tried it yet, but I expect that there will be problems as well. . I’m still on 61 now, hoping 66 can be fixed