Laptop reboots with no logs

razv · 5 September 2024 10:25

Hey! Guys! I have an Asus Tuf A 15 laptop (FA506NC, the 2021 model)
Everything works, except that once in a while, (I’m starting to think that after an exact number of uptime hours) mabye 6-7 uptime hours, it blackscreens and restarts by itself with no logs.
I’ve checked journalctl -b -1 and journalctl -b 0 and there is nothing related to an issue.

I have an and ryzen 5 and nvidia rtx 3050 with proprietary drivers
I’ve also previously added the acpi_osi=! idle=nomwait acpi_backlight=native grub parameters in order to get my brithness control to work.

Do you have any ideea?

razv · 5 September 2024 10:26

Inxi -Fza if it helps:

System:
  Kernel: 6.6.47-1-MANJARO arch: x86_64 bits: 64 compiler: gcc
    v: 14.2.1 clocksource: tsc avail: acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64
    root=UUID=9021574d-f992-4472-86ce-1b2e62d1bd4e rw
    acpi_osi=! idle=nomwait acpi_backlight=native apparmor=1
    security=apparmor udev.log_priority=3
    sysrq_always_enabled=1
  Desktop: i3 v: 4.23 with: polybar tools: xss-lock
    avail: i3lock,xautolock vt: 2 dm: SDDM Distro: Manjaro
    base: Arch Linux
Machine:
  Type: Laptop System: ASUSTeK product: ASUS TUF Gaming A15
    FA506NC_FA506NC v: 1.0 serial: <superuser required>
  Mobo: ASUSTeK model: FA506NC v: 1.0
    serial: <superuser required> uuid: <superuser required>
    UEFI: American Megatrends LLC. v: FA506NC.305
    date: 02/22/2024
Battery:
  ID-1: BAT1 charge: 29.8 Wh (60.9%)
    condition: 48.9/48.1 Wh (101.7%) volts: 11.9 min: 11.7
    model: ASUS A32-K55 type: Li-ion serial: N/A
    status: not charging
CPU:
  Info: model: AMD Ryzen 5 7535HS with Radeon Graphics
    bits: 64 type: MT MCP arch: Zen 3+ gen: 4 level: v3
    note: check built: 2022 process: TSMC n6 (7nm)
    family: 0x19 (25) model-id: 0x44 (68) stepping: 1
    microcode: 0xA404102
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12
    smt: enabled cache: L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB
    L2: 3 MiB desc: 6x512 KiB L3: 16 MiB desc: 1x16 MiB
  Speed (MHz): avg: 1274 high: 2923 min/max: 400/4603
    scaling: driver: amd-pstate-epp governor: powersave cores:
    1: 1397 2: 1398 3: 1397 4: 400 5: 1396 6: 400 7: 1397
    8: 2923 9: 1397 10: 400 11: 1397 12: 1397 bogomips: 79081
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2
    sse4a ssse3 svm
  Vulnerabilities:
  Type: gather_data_sampling status: Not affected
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow status: Vulnerable: Safe RET,
    no microcode
  Type: spec_store_bypass mitigation: Speculative Store
    Bypass disabled via prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and
    __user pointer sanitization
  Type: spectre_v2 mitigation: Retpolines; IBPB:
    conditional; IBRS_FW; STIBP: always-on; RSB filling;
    PBRSB-eIBRS: Not affected; BHI: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA GA107M [GeForce RTX 3050 Mobile]
    vendor: ASUSTeK driver: nvidia v: 550.107.02
    alternate: nouveau,nvidia_drm non-free: 550.xx+
    status: current (as of 2024-06; EOL~2026-12-xx)
    arch: Ampere code: GAxxx process: TSMC n7 (7nm)
    built: 2020-2023 pcie: gen: 4 speed: 16 GT/s lanes: 8
    link-max: lanes: 16 ports: active: none
    empty: DP-9,HDMI-A-1 bus-ID: 01:00.0 chip-ID: 10de:25a2
    class-ID: 0300
  Device-2: AMD Rembrandt [Radeon 680M] vendor: ASUSTeK
    driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
    process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: eDP-1 empty: DP-1,
    DP-2, DP-3, DP-4, DP-5, DP-6, DP-7, DP-8 bus-ID: 05:00.0
    chip-ID: 1002:1681 class-ID: 0300 temp: 45.0 C
  Device-3: Shine-optics USB2.0 HD UVC WebCam
    driver: uvcvideo type: USB rev: 2.0 speed: 480 Mb/s lanes: 1
    mode: 2.0 bus-ID: 1-4:2 chip-ID: 3277:0029 class-ID: 0e02
    serial: <filter>
  Display: x11 server: X.org v: 1.21.1.13 compositor: Picom
    v: 11 driver: X: loaded: amdgpu,nvidia
    unloaded: modesetting,nouveau alternate: fbdev,nv,vesa
    dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-size: <missing: xdpyinfo>
  Monitor-1: eDP-1 mapped: eDP
    model: Najing CEC Panda 0x004d built: 2019 res: 1920x1080
    hz: 144 dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64")
    diag: 395mm (15.5") ratio: 16:9 modes: max: 1920x1080
    min: 640x480
  API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi
    platforms: device: 0 drv: nvidia device: 1 drv: radeonsi
    device: 3 drv: swrast gbm: drv: kms_swrast surfaceless:
    drv: nvidia x11: drv: radeonsi inactive: wayland,device-2
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa
    v: 24.1.6-arch1.1 glx-v: 1.4 direct-render: yes renderer: AMD
    Radeon 660M (radeonsi rembrandt LLVM 18.1.8 DRM 3.54
    6.6.47-1-MANJARO) device-ID: 1002:1681 memory: 500 MiB
    unified: no
Audio:
  Device-1: NVIDIA vendor: ASUSTeK driver: snd_hda_intel
    v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 8 link-max:
    lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:2291
    class-ID: 0403
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor
    vendor: ASUSTeK driver: snd_pci_acp6x v: kernel
    alternate: snd_pci_acp3x, snd_rn_pci_acp3x,
    snd_pci_acp5x, snd_acp_pci, snd_rpl_pci_acp6x,
    snd_pci_ps, snd_sof_amd_renoir, snd_sof_amd_rembrandt,
    snd_sof_amd_vangogh pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 05:00.5 chip-ID: 1022:15e2 class-ID: 0480
  Device-3: AMD Family 17h/19h HD Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s
    lanes: 16 bus-ID: 05:00.6 chip-ID: 1022:15e3
    class-ID: 0403
  API: ALSA v: k6.6.47-1-MANJARO status: kernel-api
    with: aoss type: oss-emulator
    tools: alsactl,alsamixer,amixer
  Server-1: sndiod v: N/A status: off
    tools: aucat,midicat,sndioctl
  Server-2: JACK v: 1.9.22 status: off tools: N/A
  Server-3: PipeWire v: 1.2.3 status: off tools: pw-cli
  Server-4: PulseAudio v: 17.0 status: active with:
    1: pulseaudio-alsa type: plugin 2: pulseaudio-jack
    type: module tools: pacat,pactl,pavucontrol
Network:
  Device-1: Realtek RTL8111/8168/8211/8411 PCI Express
    Gigabit Ethernet vendor: ASUSTeK driver: r8168
    v: 8.053.00-NAPI modules: r8169 pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 02:00.0
    chip-ID: 10ec:8168 class-ID: 0200
  IF: eno1 state: down mac: <filter>
  Device-2: Realtek RTL8852BE PCIe 802.11ax Wireless Network
    vendor: AzureWave driver: rtw89_8852be v: kernel pcie:
    gen: 1 speed: 2.5 GT/s lanes: 1 port: d000 bus-ID: 03:00.0
    chip-ID: 10ec:b852 class-ID: 0280
  IF: wlp3s0 state: up mac: <filter>
  Info: services: NetworkManager, systemd-timesyncd,
    wpa_supplicant
Bluetooth:
  Device-1: IMC Networks Bluetooth Radio driver: btusb v: 0.8
    type: USB rev: 1.0 speed: 12 Mb/s lanes: 1 mode: 1.1
    bus-ID: 3-3:2 chip-ID: 13d3:3571 class-ID: e001
    serial: <filter>
  Report: btmgmt ID: hci0 rfk-id: 0 state: up
    address: <filter> bt-v: 5.2 lmp-v: 11 status:
    discoverable: no pairing: no class-ID: 6c010c
Drives:
  Local Storage: total: 953.87 GiB used: 24.35 GiB (2.6%)
  SMART Message: Required tool smartctl not installed.
    Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Micron
    model: MTFDKBA1T0QFM-1BD1AABGB size: 953.87 GiB block-size:
    physical: 512 B logical: 512 B speed: 63.2 Gb/s lanes: 4
    tech: SSD serial: <filter> fw-rev: V3MA101 temp: 34.9 C
    scheme: GPT
Partition:
  ID-1: / raw-size: 953.57 GiB size: 937.53 GiB (98.32%)
    used: 24.35 GiB (2.6%) fs: ext4 dev: /dev/nvme0n1p2
    maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 296 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1
    maj-min: 259:1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: 55.8 C mobo: N/A gpu: amdgpu
    temp: 46.0 C
  Fan Speeds (rpm): cpu: 1900
Info:
  Memory: total: 16 GiB note: est. available: 14.95 GiB
    used: 2.4 GiB (16.0%)
  Processes: 304 Power: uptime: 10m states: freeze,mem,disk
    suspend: s2idle wakeups: 0 hibernate: platform
    avail: shutdown, reboot, suspend, test_resume
    image: 5.96 GiB services: upowerd Init: systemd v: 256
    default: graphical tool: systemctl
  Packages: pm: pacman pkgs: 1604 libs: 361 tools: pamac,yay
    Compilers: gcc: 14.2.1 Shell: Zsh v: 5.9 running-in: kitty
    inxi: 3.3.35

linux-aarhus · 5 September 2024 10:59

The random rebooting with no logs is a fairly new issue or perhaps not.

There is no reasonable explanation other than some had an undervolting issue for their cpu.

It is suggested that undervoltning may be causing the issue when the workload increases and the power to the processor is not increased to match the workload and the affected systems is usually systems equipped with AMD APU.

It is suggested that Windows has a way of dealing with this - one comment suggested that Windows was ignoring a limit set in the firmware - and Linux kernel does not.

This results in reboots - seemingly random and with no explicit cause.

You will have to talk to the vendor - they will be able to advise what to do - as this is not specifically a kernel issue or software ditto - but the way the firmware is configured to react to changes in workloads.

Please search the forum for similar issues - I am sure you find what I am referring to - I don’t remember exact which thread - there has only been - perhaps a handful.

Search results for 'random reboot order:latest_topic' - Manjaro Linux Forum

razv · 5 September 2024 11:24

Could overvolting the cpu from bios make any change?
I don’t even know if this option is unlocked, but still…

razv · 5 September 2024 11:28

I also saw, this: Gpu crashes when I play games or when I use the system - #15 by Kobold
Could it be of any help?

linux-aarhus · 5 September 2024 11:31

I have no idea … I have no hands-on with such systems - so any advise from me on the matter would be bad - I can only lead you in the direction - what to do - it is up to you.

Carefully examine the topics - think, think again, more important - understand what you are doing, before you apply something you later regret.

The safest path is to use your vendors support channels.

I have a ThinkPad x13 AMD gen4 - with 7840u APU - but a Lenovo is not the same as an Asus - the firmware is different.

razv · 5 September 2024 11:36

I’ve started running a stress test (stess-ng) And it’s been running at 100% and about 70 deg for 10 minutes…
So could it still be an undervolt problem?
Btw, when it crashed, (today and 2 days ago) I was just writing some code with just nvim and a browser open, the cpu was at mabye 3-4 percent, so nothing computationally expensive…
And I’ve checked the ram, it seems to be fine.

linux-aarhus · 5 September 2024 11:38

As I said - I don’t know - no ideas - nothing comes up - sorry.

There is really only one thing - but that is simply standard assumption - always keep your system up-to-date.

razv · 5 September 2024 11:44

It’s up to date. Kernel 6.6 and stable branch.

linux-aarhus · 5 September 2024 11:51

You could test with 6.10 - it won’t hurt

sudo mhwd-kernel -i linux610

razv · 5 September 2024 12:45

I tried, but wifi and bluetooth on my device don’t work with 6.1, so that’s not an option…

linux-aarhus · 5 September 2024 12:51

Uhh - I think you misread 6.10 as 6.1 - kernel 6.10 is newer than 6.6 - we even have linux 6.11rc4 on stable and 6.11rc6 on the edge branch

razv · 5 September 2024 14:34

oh. srry. will try

Kobold · 5 September 2024 15:13

It could help, yeah (But Hardware is complex). I had exactly 20year’s ago a problem with my single core which was 6 year’s old at this time and it rebooted randomly in CPU heavy tasks, while the CPU was still good cooled… after painfull year for bug hunting and infinity system reinstall!

I came to the conclusion to mess around with the vcore settings and increased it a little, the system was stable after that.

The same could probably related to your RAM, the silicon lottery + default bios settings not always fits to all the Hardware… specially when the Hardware gets older it can (not must) required (only a little) more voltage.

But that said, if you have the option to improve cooling (like refreshing thermalpaste: i can recommend Grizzly), the voltage increase is not always needed.

The better the cooling the lower the voltage that the hardware required to run stable.

And you can see this with Watercooled systems, where some user’s can really hard undervolt their CPU/GPU.

razv · 10 September 2024 17:40

Sadly, my bios doesn’t have that option. It happened again, after 8.5 hours of uptime. Still no logs, no dmesg shows nothing that would indicate cpu voltage problems (like the arch wiki says ryzen 5 reports)…

Under a stress test, about half an hour before the crash it performed totally fine…
And I only had a browser open…

Any more ideas?

Kobold · 11 September 2024 04:19

With a swap partition or file, your logs probably shows the error messages.

This is sadly normal around the most Laptop Bios settings. PC’s allows per default a lot more freedom around your adjustments.