Random system crashes on new hardware

On my brand new system and I am experiencing seemingly random system crashes. Out of the blue, the mousepointer will begin stuttering and become very sluggish for about 10 seconds, then the machine just reboots.

At first I ran Gnome, where it happened more or less daily. Then I reinstalled XFCE edition, and now it happens about once a week. That is, I have found a way to reproduce it: If I have Deluge running with a bunch of active torrents, and play a video in VLC at the same time it will happen within 10-15 minutes. When running Deluge only it will crash within a couple of hours.

I also ran Deluge and VLC simultaneously on Windows for 2-3 hours without any symptoms, so I doubt this is a hardware issue.

The log from journalctl does not yield anything interesting as far as I can tell, but really my knowledge of these things is limited. I notice that there is nothing at the exact times when the crashes occur.

In one case while I was running Gnome, I came back from a walk and all the icons on the Gnome desktop had been replaced with dummy icons. When I rebooted the machine GRUB was gone and it booted straight into Windows. The data on the Manjaro-partition was all good, though. Don’t know if these issues are related.

What I have tried:

Disable global c-states in BIOS

This did not help. Also I doubt that this is the problem, since grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name outputs (without disabled c-states in BIOS):

/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3

This means that the max c-state is 3, right? On some systems c-state 6 can cause problems, but that does not seem to be in play here.

Disable NVMe power states

Did this by adding nvme_core.default_ps_max_latency_us=0 to GRUB. Did not help. I assume it is switched off since nvme get-feature /dev/nvme0n1 -f 0x0c -H outputs Autonomous Power State Transition Enable (APSTE): Disabled

Lowering speed of RAM

Tried lowering speed of RAM from 3200MHz (advertised speed) to 2400. Had no effect.

Kernels 5.10.x and 5.11.x

Both of these has the same issue.

CPU stress tests

Running s-tui and Prime95 on Windows for at least 45 minutes made the fan speed up like I have never witnessed before, but did not trigger the issue.

Windows memory check (mdsched.exe)

Reports no errors.

Using another wall socket

The extension cord I was using was one of those with an on/off switch. The switch has seemed to be unstable, so I switched to another wall socket. Unfortunately it did not help.

Output from journalctl around the time of one of the instances:

april 26 19:46:54 johs-pc kernel: audit: type=1130 audit(1619459214.492:105): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd>
april 26 19:47:24 johs-pc systemd[1]: systemd-hostnamed.service: Succeeded.
april 26 19:47:24 johs-pc audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=>
april 26 19:47:24 johs-pc kernel: audit: type=1131 audit(1619459244.528:106): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd>
april 26 19:47:24 johs-pc audit: BPF prog-id=18 op=UNLOAD
april 26 19:47:24 johs-pc audit: BPF prog-id=17 op=UNLOAD
april 26 19:47:24 johs-pc kernel: audit: type=1334 audit(1619459244.652:107): prog-id=18 op=UNLOAD
april 26 19:47:24 johs-pc kernel: audit: type=1334 audit(1619459244.652:108): prog-id=17 op=UNLOAD
-- Boot fc730bf8e8b34d86999ea1d45598d4af --
april 26 19:50:42 johs-pc kernel: Linux version 5.11.14-1-MANJARO (builduser@LEGION) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Wed Apr 14 08:25:29 UTC 2021
april 26 19:50:42 johs-pc kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.11-x86_64 root=UUID=c398a086-6f8a-42e7-8d87-54c4578033a3 rw quiet apparmor=1 security=apparmor udev.log_priority=3 nvme_core.default_ps>
april 26 19:50:42 johs-pc kernel: KERNEL supported cpus:
april 26 19:50:42 johs-pc kernel:   Intel GenuineIntel
april 26 19:50:42 johs-pc kernel:   AMD AuthenticAMD
april 26 19:50:42 johs-pc kernel:   Hygon HygonGenuine
april 26 19:50:42 johs-pc kernel:   Centaur CentaurHauls
april 26 19:50:42 johs-pc kernel:   zhaoxin   Shanghai  
april 26 19:50:42 johs-pc kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
april 26 19:50:42 johs-pc kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
april 26 19:50:42 johs-pc kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
april 26 19:50:42 johs-pc kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
april 26 19:50:42 johs-pc kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
april 26 19:50:42 johs-pc kernel: BIOS-provided physical RAM map:
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000009c7efff] usable
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x0000000009c7f000-0x0000000009ffffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000000a000000-0x000000000a1fffff] usable
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000000a200000-0x000000000a20efff] ACPI NVS
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000000a20f000-0x000000000affffff] usable
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000000b000000-0x000000000b01ffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000000b020000-0x0000000098801fff] usable
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x0000000098802000-0x0000000099df1fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x0000000099df2000-0x000000009a029fff] ACPI data
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000009a02a000-0x000000009bab4fff] ACPI NVS
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000009bab5000-0x000000009c961fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000009c962000-0x000000009c9fefff] type 20
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000009c9ff000-0x000000009dffffff] usable
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x000000009e000000-0x00000000bfffffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fd200000-0x00000000fd2fffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fd600000-0x00000000fd6fffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fea00000-0x00000000fea0ffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000feb80000-0x00000000fec01fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fec30000-0x00000000fec30fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fedc2000-0x00000000fedcffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000fedd4000-0x00000000fedd5fff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
april 26 19:50:42 johs-pc kernel: BIOS-e820: [mem 0x0000000100000000-0x000000043e2fffff] usable

Output of inxi --admin --verbosity=7 --filter --no-host --width:

System:
  Kernel: 5.11.14-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.11-x86_64 
  root=UUID=c398a086-6f8a-42e7-8d87-54c4578033a3 rw quiet apparmor=1 
  security=apparmor udev.log_priority=3 nvme_core.default_ps_max_latency_us=0 
  Desktop: Xfce 4.16.0 tk: Gtk 3.24.24 info: xfce4-panel wm: xfwm4 
  dm: LightDM 1.30.0 Distro: Manjaro Linux base: Arch Linux 
Machine:
  Type: Desktop System: Komplett product: Komplett PC v: N/A serial: <filter> 
  Mobo: ASUSTeK model: ROG STRIX B450-F GAMING v: Rev 1.xx serial: <filter> 
  UEFI: American Megatrends v: 4301 date: 03/04/2021 
Battery:
  Message: No system battery data found. Is one present? 
Memory:
  RAM: total: 15.07 GiB used: 3.25 GiB (21.6%) 
  Array-1: capacity: 128 GiB slots: 4 EC: None max-module-size: 32 GiB 
  note: est. 
  Device-1: DIMM_A1 size: No Module Installed 
  Device-2: DIMM_A2 size: 8 GiB speed: 3200 MT/s type: DDR4 
  detail: synchronous unbuffered (unregistered) bus-width: 64 bits 
  total: 64 bits manufacturer: Kingston part-no: KHX3200C18D4/8G 
  serial: <filter> 
  Device-3: DIMM_B1 size: No Module Installed 
  Device-4: DIMM_B2 size: 8 GiB speed: 3200 MT/s type: DDR4 
  detail: synchronous unbuffered (unregistered) bus-width: 64 bits 
  total: 64 bits manufacturer: Kingston part-no: KHX3200C18D4/8G 
  serial: <filter> 
CPU:
  Info: 6-Core model: AMD Ryzen 5 PRO 4650G with Radeon Graphics socket: AM4 
  bits: 64 type: MT MCP arch: Zen 2 family: 17 (23) model-id: 60 (96) 
  stepping: 1 microcode: 8600106 cache: L1: 384 KiB L2: 3 MiB L3: 8 MiB 
  bogomips: 88668 
  Speed: 1397 MHz min/max: 1400/3700 MHz base/boost: 3700/4300 boost: enabled 
  volts: 1.2 V ext-clock: 100 MHz Core speeds (MHz): 1: 1397 2: 1830 3: 1397 
  4: 1397 5: 1709 6: 1397 7: 1397 8: 1397 9: 2863 10: 1518 11: 1386 12: 1397 
  Flags: 3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1 
  bmi2 bpext cat_l3 cdp_l3 clflush clflushopt clwb clzero cmov cmp_legacy 
  constant_tsc cpb cpuid cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc 
  cr8_legacy cx16 cx8 de decodeassists extapic extd_apicid f16c flushbyasid 
  fma fpu fsgsbase fxsr fxsr_opt ht hw_pstate ibpb ibrs ibs irperf lahf_lm 
  lbrv lm mba mca mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx 
  nonstop_tsc nopl npt nrip_save nx osvw overflow_recov pae pat pausefilter 
  pclmulqdq pdpe1gb perfctr_core perfctr_llc perfctr_nb pfthreshold pge pni 
  popcnt pse pse36 rdpid rdpru rdrand rdseed rdt_a rdtscp rep_good sep sha_ni 
  skinit smap smca smep ssbd sse sse2 sse4_1 sse4_2 sse4a ssse3 stibp succor 
  svm svm_lock syscall tce topoext tsc tsc_scale umip v_vmsave_vmload vgif 
  vmcb_clean vme vmmcall wbnoinvd wdt xgetbv1 xsave xsavec xsaveerptr xsaveopt 
  xsaves 
  Vulnerabilities: Type: itlb_multihit status: Not affected 
  Type: l1tf status: Not affected 
  Type: mds status: Not affected 
  Type: meltdown status: Not affected 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, IBRS_FW, 
  STIBP: conditional, RSB filling 
  Type: srbds status: Not affected 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: AMD Renoir vendor: ASUSTeK driver: amdgpu v: kernel 
  bus-ID: 09:00.0 chip-ID: 1002:1636 class-ID: 0300 
  Display: server: X.Org 1.20.11 driver: loaded: amdgpu,ati 
  unloaded: modesetting alternate: fbdev,vesa display-ID: :0.0 screens: 1 
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.0x11.2") 
  s-diag: 582mm (22.9") 
  Monitor-1: HDMI-A-0 res: 1920x1080 hz: 60 dpi: 94 
  size: 521x293mm (20.5x11.5") diag: 598mm (23.5") 
  OpenGL: renderer: AMD RENOIR (DRM 3.40.0 5.11.14-1-MANJARO LLVM 11.1.0) 
  v: 4.6 Mesa 21.0.2 direct render: Yes 
Audio:
  Device-1: AMD vendor: ASUSTeK driver: snd_hda_intel v: kernel 
  bus-ID: 09:00.1 chip-ID: 1002:1637 class-ID: 0403 
  Device-2: AMD Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel 
  v: kernel bus-ID: 09:00.6 chip-ID: 1022:15e3 class-ID: 0403 
  Device-3: Mackie Designs Onyx Blackjack type: USB driver: snd-usb-audio 
  bus-ID: 3-1:2 chip-ID: 0a73:0010 class-ID: 0102 
  Sound Server-1: ALSA v: k5.11.14-1-MANJARO running: yes 
  Sound Server-2: JACK v: 1.9.17 running: no 
  Sound Server-3: PulseAudio v: 14.2 running: yes 
  Sound Server-4: PipeWire v: 0.3.25 running: no 
Network:
  Device-1: Intel I211 Gigabit Network vendor: ASUSTeK driver: igb v: kernel 
  port: d000 bus-ID: 03:00.0 chip-ID: 8086:1539 class-ID: 0200 
  IF: enp3s0 state: down mac: <filter> 
  Device-2: Realtek RTL8192CE PCIe Wireless Network Adapter vendor: ASUSTeK 
  driver: rtl8192ce v: kernel port: c000 bus-ID: 04:00.0 chip-ID: 10ec:8178 
  class-ID: 0280 
  IF: wlp4s0 state: up mac: <filter> 
  IP v4: <filter> type: dynamic noprefixroute scope: global 
  broadcast: <filter> 
  IP v6: <filter> type: noprefixroute scope: link 
  WAN IP: <filter> 
Bluetooth:
  Message: No bluetooth data found. 
Logical:
  Message: No logical block device data found. 
RAID:
  Message: No RAID data found. 
Drives:
  Local Storage: total: 5.44 TiB used: 118.8 GiB (2.1%) 
  SMART Message: Required tool smartctl not installed. Check --recommends 
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital 
  model: WDS100T2B0C-00PXH0 size: 931.51 GiB block-size: physical: 512 B 
  logical: 512 B speed: 31.6 Gb/s lanes: 4 rotation: SSD serial: <filter> 
  rev: 211210WD temp: 27.9 C scheme: GPT 
  ID-2: /dev/sda maj-min: 8:0 vendor: Kingston model: SA400S37960G 
  size: 894.25 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s 
  rotation: SSD serial: <filter> rev: K1B3 scheme: MBR 
  ID-3: /dev/sdb maj-min: 8:16 type: USB vendor: Western Digital 
  model: WD Elements 25A3 size: 3.64 TiB block-size: physical: 4096 B 
  logical: 512 B serial: <filter> rev: 1019 scheme: GPT 
  ID-4: /dev/sdc maj-min: 8:32 type: USB vendor: Verbatim model: STORE N GO 
  size: 14.75 GiB block-size: physical: 512 B logical: 512 B serial: <filter> 
  rev: 1.00 scheme: MBR 
  Message: No optical or floppy data found. 
Partition:
  ID-1: / raw-size: 734.07 GiB size: 721.48 GiB (98.28%) 
  used: 118.77 GiB (16.5%) fs: ext4 block-size: 4096 B dev: /dev/nvme0n1p5 
  maj-min: 259:5 label: N/A uuid: c398a086-6f8a-42e7-8d87-54c4578033a3 
  ID-2: /boot/efi raw-size: 260 MiB size: 256 MiB (98.46%) 
  used: 25.6 MiB (10.0%) fs: vfat block-size: 512 B dev: /dev/nvme0n1p1 
  maj-min: 259:1 label: SYSTEM uuid: 6E20-5105 
Swap:
  Alert: No swap data was found. 
Unmounted:
  ID-1: /dev/nvme0n1p2 maj-min: 259:2 size: 16 MiB fs: N/A label: N/A 
  uuid: N/A 
  ID-2: /dev/nvme0n1p3 maj-min: 259:3 size: 196.69 GiB fs: ntfs label: Windows 
  uuid: 5C0A20F70A20D036 
  ID-3: /dev/nvme0n1p4 maj-min: 259:4 size: 500 MiB fs: ntfs 
  label: Recovery tools uuid: 8018214D1821438E 
  ID-4: /dev/sda1 maj-min: 8:1 size: 100 MiB fs: ntfs label: System Reserved 
  uuid: 083643023642EFEC 
  ID-5: /dev/sda2 maj-min: 8:2 size: 98.46 GiB fs: ntfs label: N/A 
  uuid: 7CCC44A3CC445992 
  ID-6: /dev/sda3 maj-min: 8:3 size: 795.69 GiB fs: ext4 label: N/A 
  uuid: 43b36098-6834-4fc2-80c6-3ff8a4910117 
  ID-7: /dev/sdb1 maj-min: 8:17 size: 3.64 TiB fs: ext4 label: ElementsJohs 
  uuid: 35d2289c-e2aa-4951-a2a1-657936e58196 
  ID-8: /dev/sdc1 maj-min: 8:33 size: 14.75 GiB fs: ntfs label: N/A 
  uuid: 03C0A47A7F1FA59C 
USB:
  Hub-1: 1-0:1 info: Full speed (or root) Hub ports: 10 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Device-1: 1-5:2 info: Primax USB Optical Mouse type: Mouse 
  driver: hid-generic,usbhid interfaces: 1 rev: 2.0 speed: 1.5 Mb/s 
  power: 100mA chip-ID: 0461:4d22 class-ID: 0301 
  Hub-2: 2-0:1 info: Full speed (or root) Hub ports: 4 rev: 3.1 speed: 10 Gb/s 
  chip-ID: 1d6b:0003 class-ID: 0900 
  Device-1: 2-1:2 info: Western Digital Elements Desktop (WDBWLG) 
  type: Mass Storage driver: usb-storage interfaces: 1 rev: 3.0 speed: 5 Gb/s 
  power: 8mA chip-ID: 1058:25a3 class-ID: 0806 serial: <filter> 
  Device-2: 2-3:3 info: Verbatim Flash Drive (StorenGo) type: Mass Storage 
  driver: usb-storage interfaces: 1 rev: 3.0 speed: 5 Gb/s power: 304mA 
  chip-ID: 18a5:0243 class-ID: 0806 serial: <filter> 
  Hub-3: 3-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Device-1: 3-1:2 info: Mackie Designs Onyx Blackjack type: Audio 
  driver: snd-usb-audio interfaces: 3 rev: 1.1 speed: 12 Mb/s power: 500mA 
  chip-ID: 0a73:0010 class-ID: 0102 
  Hub-4: 4-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1 speed: 10 Gb/s 
  chip-ID: 1d6b:0003 class-ID: 0900 
  Hub-5: 5-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Device-1: 5-1:2 info: Logitech Illuminated Keyboard type: Keyboard,Mouse 
  driver: hid-generic,usbhid interfaces: 2 rev: 2.0 speed: 12 Mb/s 
  power: 300mA chip-ID: 046d:c318 class-ID: 0300 
  Hub-6: 6-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1 speed: 10 Gb/s 
  chip-ID: 1d6b:0003 class-ID: 0900 
Sensors:
  System Temperatures: cpu: 28.2 C mobo: 0 C gpu: amdgpu temp: 25.0 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 333 Uptime: 4h 05m wakeups: 0 Init: systemd v: 247 
  tool: systemctl Compilers: gcc: 10.2.0 Packages: pacman: 1196 lib: 333 
  flatpak: 0 Shell: Bash (sudo) v: 5.1.0 running-in: xfce4-terminal 
  inxi: 3.3.04 

:+1: Welcome to Manjaro! :+1:

  1. This is by far the very best first post I’ve seen. Veteran Linux user? :grin:
  2. Two things you haven’t tried AFAIU:
    • Kernel 5.4 LTS results?

    • output of:

      smartctl --all /dev/nvme0
      smartctl --all /dev/sda
      smartctl --all /dev/sdb
      smartctl --all /dev/sdc
      mount | grep "^/dev/.d"
      

      Please?

:crossed_fingers:

maybe try Linux512 kernel series, which has some patches for newer AMD hardware.

1 Like

Thanks! Yes, have been running some form of Linux, more or less exclusively for about 15 years I think. Not much of an expert when it comes to things like this, though, it has mostly just worked and I haven’t had to mess around much.

As I am writing this Deluge and VLC has been running together for close to an hour, after todays update. This is longer without crashing than when I have tested this before, but we’ll see.

I would like to keep the test running, so I can’t switch to the 5.4 kernel just now, but below are the outputs of the commands you requested. Not really sure what to look for here. Please do tell if you find something interesting.

smartctl --all /dev/nvme0

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.16-2-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDC WDS100T2B0C-00PXH0
Serial Number:                      210242488110
Firmware Version:                   211210WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1 000 204 886 016 [1,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1 000 204 886 016 [1,00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 4a46c9977d
Local Time is:                      Wed Apr 28 23:16:41 2021 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.50W    2.90W       -    0  0  0  0        0       0
 1 +     2.70W    1.80W       -    0  0  0  0        0       0
 2 +     1.90W    1.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     3900   11000
 4 -   0.0050W       -        -    4  4  4  4     5000   39000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        28 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1 442 956 [738 GB]
Data Units Written:                 2 119 076 [1,08 TB]
Host Read Commands:                 15 918 792
Host Write Commands:                16 732 894
Controller Busy Time:               66
Power Cycles:                       95
Power On Hours:                     678
Unsafe Shutdowns:                   4
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

smartctl --all /dev/sda

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.16-2-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37960G
Serial Number:    50026B76833F4E33
LU WWN Device Id: 5 0026b7 6833f4e33
Firmware Version: SBFKK1B3
User Capacity:    960 197 124 096 bytes [960 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 28 23:26:03 2021 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Total time to complete Offline 
data collection: 		(65535) seconds.
Offline data collection
capabilities: 			 (0x00) 	Offline data collection not supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3812
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       655
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       8
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/18
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       16 (Average 8)
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       16
194 Temperature_Celsius     0x0022   024   042   000    Old_age   Always       -       24 (Min/Max 17/42)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   099   099   000    Old_age   Offline      -       99
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       2345
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       2010
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       957
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       8
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       16
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       190656

SMART Error Log not supported

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

smartctl --all /dev/sdb

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.16-2-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue
Device Model:     WDC WD40EZRZ-00GXCB0
Serial Number:    WD-WCC7K5JK2Z25
LU WWN Device Id: 5 0014ee 20ffab153
Firmware Version: 80.00A80
User Capacity:    4 000 787 030 016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 28 23:27:12 2021 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(45840) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 486) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   183   164   021    Pre-fail  Always       -       5808
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       217
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       235
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       217
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       4628
194 Temperature_Celsius     0x0022   114   108   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sudo smartctl --all /dev/sdc

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.16-2-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sdc failed: No such device

sdc is appearently not plugged in, but the crash happened once earlier today (before the update), with the same disks connected.

mount | grep "^/dev/.d"

/dev/sdb1 on /run/media/johs/ElementsJohs type ext4 (rw,nosuid,nodev,relatime,uhelper=udisks2)
1 Like

Thanks! Will do, but it might be that it got fixed with todays update while still on 5.11. Got Deluge and VLC running, which has triggered it multiple times before in under 20 minutes, for over an hour now. Fingers crossed! If not, I’ll try your suggestion.

1 Like

I was just looking for failing disks (none) and Deluge running from an NTFS volume (not the case).

Also when anyone say something and @philm says something else, you should always follow his advice, so try 5.12 before trying 5.4. (yeah, we’ll wait as I’m going to :bed: :zzz: now)

:grin:

Ok, so it seems like I have narrowed it down. I tried running Deluge + VLC while on cabled network. On WiFi this would crash and reboot the machine in 2-4 hours. On cabled it ran for 11 hours before I cut it out and concluded the WiFi card is the issue.

I also found three other cases describing more or less the exact same symptoms:

https://forums.linuxmint.com/viewtopic.php?t=260603
https://forums.gentoo.org/viewtopic-t-1103468-start-0.html

All in all it seems quite certain that it is my Realtek RTL8192CE PCIe Wireless Network Adapter which was causing the issue. I have since got me a powerlink adapter and the system has been running 100% rock solid since then.

I am, however, a bit worried about what happened when the Gnome icons/settings were lost and GRUB disappeared. Could this have something to do with NVMe power states?

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.