Hello, my system has randomly crashed 3 times in the past 24 hours. Each time I’ve been away from the machine (one time for less than 5mins). By “crash” I mean no more mouse, can’t enter TTY, system is completely locked up.
I’m looking for how to diagnose this issue. My next step after the next crash is to fall back to LTS.
System: Kernel: 5.10.7-3-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.1 Desktop: KDE Plasma 5.20.5 Distro: Manjaro Linux
Machine: Type: Desktop System: Dell product: Inspiron 5675 v: 1.3.7 serial: <filter>
Mobo: Dell model: 07PR60 v: A00 serial: <filter> UEFI: Dell v: 1.3.7 date: 03/14/2018
CPU: Info: 8-Core model: AMD Ryzen 7 1700X bits: 64 type: MT MCP arch: Zen rev: 1 L2 cache: 4 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 108639
Speed: 3140 MHz min/max: 2200/3400 MHz boost: enabled Core speeds (MHz): 1: 3140 2: 2918 3: 1927 4: 1866 5: 1922
6: 1868 7: 2801 8: 2696 9: 2029 10: 2048 11: 1864 12: 1804 13: 2181 14: 2141 15: 2481 16: 2057
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] vendor: Dell
driver: amdgpu v: kernel bus ID: 09:00.0
Display: x11 server: X.Org 1.20.10 driver: loaded: amdgpu,ati unloaded: modesetting resolution: 1: 1680x1050~60Hz
2: 2560x1440~60Hz
OpenGL: renderer: AMD Radeon RX 480 Graphics (POLARIS10 DRM 3.40.0 5.10.7-3-MANJARO LLVM 11.0.1) v: 4.6 Mesa 20.3.3
direct render: Yes
Audio: Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: Dell driver: snd_hda_intel v: kernel
bus ID: 09:00.1
Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: Dell driver: snd_hda_intel v: kernel
bus ID: 0b:00.3
Device-3: Sunplus Innovation FHD Capture type: USB driver: snd-usb-audio,uvcvideo bus ID: 4-3:2
Device-4: HTC (High Tech ) Vive type: USB driver: snd-usb-audio,uvcvideo bus ID: 1-4.1.5:10
Device-5: HTC (High Tech ) Vive type: USB driver: hid-generic,usbhid
Sound Server: ALSA v: k5.10.7-3-MANJARO
Network: Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Dell driver: r8169 v: kernel port: 3000
bus ID: 01:00.0
IF: enp1s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter vendor: Dell driver: ath10k_pci v: kernel
port: 3000 bus ID: 05:00.0
IF: wlp5s0 state: down mac: <filter>
Device-3: Qualcomm Atheros type: USB driver: btusb bus ID: 1-12:6
Drives: Local Storage: total: 6.14 TiB used: 349.68 GiB (5.6%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS250G2X0C-00L350 size: 232.89 GiB temp: 55.9 C
ID-2: /dev/sda vendor: Toshiba model: DT01ACA100 size: 931.51 GiB
ID-3: /dev/sdb vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB
ID-4: /dev/sdc type: USB vendor: Seagate model: BUP Portable size: 4.55 TiB
Partition: ID-1: / size: 117.62 GiB used: 17.76 GiB (15.1%) fs: ext4 dev: /dev/sdb2
ID-2: /boot/efi size: 299.4 MiB used: 312 KiB (0.1%) fs: vfat dev: /dev/sdb1
ID-3: /home size: 330.15 GiB used: 134.2 GiB (40.6%) fs: ext4 dev: /dev/sdb3
Swap: ID-1: swap-1 type: partition size: 9 GiB used: 0 KiB (0.0%) dev: /dev/sdb4
Sensors: System Temperatures: cpu: 46.1 C mobo: N/A gpu: amdgpu temp: 54.0 C
Fan Speeds (RPM): fan-1: 858 fan-2: 1194 gpu: amdgpu fan: 800
Info: Processes: 387 Uptime: 11m Memory: 15.59 GiB used: 3.1 GiB (19.9%) Init: systemd Compilers: gcc: 10.2.0
Packages: 1480 Shell: Bash v: 5.1.0 inxi: 3.2.02
I’m running KDE. Additionally I usually have Firefox, Kate, Dolphin, and Element running. All crashes all those softwares were running. I also run fcitx for my IME.
I try to keep a close eye on temps and memory usage at all times. I know the GPU temps are fine (I run my own fan software, and my PC turns into a jet engine when it spikes, it also has hardware throttling), CPU temps should be fine (but again, away from PC when it dies), and RAM usage on at least the 2 past crashes should have been very low.
I’ve had a similar issue in the past where if my NVME drive wasn’t loaded correctly (nvme_core.default_ps_max_latency_us=5500
), if something tried to scan it (lspci
), I’d get similar lockups, I found that incredibly tricky to diagnose.