My system rebooted while mousing in Firefox (reading a forum thread), and found these errors post reboot that I am hoping someone can help me understand and/or dig deeper into…
$ sudo dmesg | grep Error
[ 0.342642] mce: [Hardware Error]: Machine check events logged
[ 0.342643] mce: [Hardware Error]: CPU 8: Machine Check: 0 Bank 5: bea0000001000108
[ 0.342650] mce: [Hardware Error]: TSC 0 ADDR ffffffc0ecab9c MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
[ 0.342654] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1636419924 SOCKET 0 APIC 5 microcode a201009
[ 0.346709] RAS: Correctable Errors collector initialized.
First time experiencing a reboot while using my system, I migrated to Manjaro KDE Plasma back in July, and use the stable branch. Actually, that isn’t 100% true… I triggered a reboot once when I was first building/testing my first conky script… and learned I wasn’t alone in having that happen when working through the conky learning curve.
I sat on 5.13.x kernels ever since they were available, and just transitioned to 5.14.10 with the last 2021-10-16 Stable branch update. I thought I’d mention this since swapping kernel branches is a “recent” change… but I’m not sure if that’s significant since the 3-ish weeks since have gone smoothly? But then again, I’m not sure what/how a kernel issue manifests once triggered, so I can’t rule it out.
The only other “recent change” I can think of in the last week or so is that I have been letting a Steam “Idle game” run minimized in the background. Conky typically lists it as a process using ~7% CPU… but that’s split across the cores as no thread was pinned at/near 100%; considering that with SMT on a 6 core CPU, one 100% pinned thread (of 12) would be 8% CPU utilization. So it’s not like I was running with one thread pinned (or very near-pinned) 24/7 for about a week or more.
I learned a bit @ What are Machine Check Exceptions (or MCE)? - Advanced Clustering Technologies, and thought I could learn more from looking at the mcelog
file it mentioned… but then learned at Machine-check exception - ArchWiki that feature has been deprecated
… so not sure where to go from here.
inxi -F details
$ inxi -Fx
System: Host: AM4-5600X-Linux Kernel: 5.14.10-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0 Desktop: KDE Plasma 5.22.5
Distro: Manjaro Linux base: Arch Linux
Machine: Type: Desktop System: Micro-Star product: MS-7C35 v: 2.0 serial: <superuser required>
Mobo: Micro-Star model: MEG X570 UNIFY (MS-7C35) v: 2.0 serial: <superuser required> UEFI: American Megatrends LLC.
v: A.80 date: 01/22/2021
CPU: Info: 6-Core model: AMD Ryzen 5 5600X bits: 64 type: MT MCP arch: Zen 3 rev: 0 cache: L2: 3 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 88825
Speed: 3452 MHz min/max: 2200/3700 MHz boost: enabled Core speeds (MHz): 1: 3452 2: 2729 3: 2815 4: 3030 5: 4246
6: 4582 7: 3717 8: 3719 9: 3636 10: 2895 11: 4291 12: 4198
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT]
vendor: XFX Limited XFX Speedster MERC 319 driver: amdgpu v: kernel bus-ID: 2f:00.0
Display: x11 server: X.Org 1.20.13 driver: loaded: amdgpu,ati unloaded: modesetting,radeon resolution:
1: 2560x1440~144Hz 2: 2560x1440~144Hz
OpenGL: renderer: AMD Radeon RX 6800 XT (SIENNA_CICHLID DRM 3.42.0 5.14.10-1-MANJARO LLVM 12.0.1)
v: 4.6 Mesa 21.2.3 direct render: Yes
Audio: Device-1: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT] driver: snd_hda_intel v: kernel bus-ID: 2f:00.1
Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: Micro-Star MSI driver: snd_hda_intel
v: kernel bus-ID: 31:00.4
Device-3: Corsair CORSAIR VIRTUOSO SE USB Gaming Headset type: USB driver: hid-generic,snd-usb-audio,usbhid
bus-ID: 3-4:3
Sound Server-1: ALSA v: k5.14.10-1-MANJARO running: yes
Sound Server-2: sndio v: N/A running: no
Sound Server-3: JACK v: 1.9.19 running: no
Sound Server-4: PulseAudio v: 15.0 running: yes
Sound Server-5: PipeWire v: 0.3.38 running: yes
Network: Device-1: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8169 v: kernel port: f000 bus-ID: 27:00.0
IF: enp39s0 state: up speed: 1000 Mbps duplex: full mac: 2c:f0:5d:ae:5e:89
Bluetooth: Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8 bus-ID: 1-4:2
Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
RAID: Device-1: md127 type: mdraid level: mirror status: active size: 7.28 TiB
Info: report: 2/2 UU blocks: 7813893120 chunk-size: N/A
Components: Online: 0: sdb1 1: sdc1
Drives: Local Storage: total: 19.33 TiB used: 8.15 TiB (42.1%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS100T1X0E-00AFY0 size: 931.51 GiB temp: 42.9 C
ID-2: /dev/nvme1n1 vendor: Western Digital model: WDS100T3X0C-00SJG0 size: 931.51 GiB temp: 38.9 C
ID-3: /dev/nvme2n1 vendor: Western Digital model: WDS100T1X0E-00AFY0 size: 931.51 GiB temp: 43.9 C
ID-4: /dev/nvme3n1 vendor: Western Digital model: WDS200T2B0C-00PXH0 size: 1.82 TiB temp: 32.9 C
ID-5: /dev/sda vendor: Samsung model: SSD 840 EVO 250GB size: 232.89 GiB
ID-6: /dev/sdb vendor: Western Digital model: WD80EFAX-68KNBN0 size: 7.28 TiB
ID-7: /dev/sdc vendor: Western Digital model: WD80EFAX-68KNBN0 size: 7.28 TiB
Partition: ID-1: / size: 915.53 GiB used: 459.93 GiB (50.2%) fs: ext4 dev: /dev/nvme2n1p2
ID-2: /boot/efi size: 299.4 MiB used: 288 KiB (0.1%) fs: vfat dev: /dev/nvme2n1p1
Swap: ID-1: swap-1 type: file size: 38 GiB used: 0 KiB (0.0%) file: /swapfile
Sensors: System Temperatures: cpu: 42.0 C mobo: N/A gpu: amdgpu temp: 54.0 C
Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info: Processes: 332 Uptime: 19m Memory: 31.27 GiB used: 4.79 GiB (15.3%) Init: systemd Compilers: gcc: 11.1.0
Packages: 1413 Shell: Bash v: 5.1.8 inxi: 3.3.08