I’m experiencing crashes and reboots from time to time that are either caused by the CPU or the GPU.
Sudden reboots are sometimes caused while gaming causing the screen to first turn completely green and then turning off as the system reboots. This led to the following error in journlactl:
Okt 25 13:28:13 Desktop kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108
Okt 25 13:28:13 Desktop kernel: mce: [Hardware Error]: TSC 0 ADDR 7fd79d72e6e6 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Okt 25 13:28:13 Desktop kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603628885 SOCKET 0 APIC 3 microcode 8701021
Okt 25 13:28:13 Desktop kernel: mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 5: bea0000000000108
Okt 25 13:28:13 Desktop kernel: mce: [Hardware Error]: TSC 0 ADDR 7fd79d20fe90 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Okt 25 13:28:13 Desktop kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603628885 SOCKET 0 APIC 9 microcode 8701021
I can’t recall what exactly happend the time before this crash, but sometimes the system wakes up from hibernation but I get no video output and my display is switching between input and power saving mode. Maybe unrelated to this I’ve saved the following log suggesting a problem with the GPU:
Okt 09 18:46:22 Desktop kernel: amdgpu: [powerplay] failed send message: RunBtc (58) param: 0x00000000 response 0xffffffc2
Okt 09 18:46:22 Desktop kernel: amdgpu: [powerplay] RunBtc failed!
Okt 09 18:46:22 Desktop kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Okt 09 18:46:22 Desktop kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-62).
Okt 09 18:46:22 Desktop kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Okt 09 18:46:22 Desktop kernel: PM: Device 0000:28:00.0 failed to resume async: error -62
Okt 09 18:46:22 Desktop kernel: Move buffer fallback to memcpy unavailable
Okt 09 18:46:22 Desktop kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Okt 09 18:46:23 Desktop kernel: Move buffer fallback to memcpy unavailable
Okt 09 18:46:23 Desktop kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Okt 09 18:46:23 Desktop kernel: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
Okt 09 18:46:23 Desktop kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Okt 09 18:46:23 Desktop kernel: #PF: supervisor read access in kernel mode
Okt 09 18:46:23 Desktop kernel: #PF: error_code(0x0000) - not-present page
Okt 09 18:46:23 Desktop kernel: Move buffer fallback to memcpy unavailable
Okt 09 18:46:23 Desktop kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Okt 09 18:46:23 Desktop kernel: Move buffer fallback to memcpy unavailable
Okt 09 18:46:23 Desktop kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Okt 09 18:46:23 Desktop kernel: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
Okt 09 18:46:23 Desktop kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Okt 09 18:46:23 Desktop kernel: #PF: supervisor read access in kernel mode
Okt 09 18:46:23 Desktop kernel: #PF: error_code(0x0000) - not-present page
Okt 09 18:46:25 Desktop kernel: amdgpu: [powerplay] Msg issuing pre-check failed and SMU may be not in the right state!
Okt 09 18:46:27 Desktop kernel: amdgpu: [powerplay] Msg issuing pre-check failed and SMU may be not in the right state!
Okt 09 18:46:28 Desktop kernel: ata14: softreset failed (device not ready)
Okt 09 18:46:38 Desktop kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
Okt 09 18:46:38 Desktop kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
Okt 09 18:46:38 Desktop kernel: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
Okt 09 18:46:38 Desktop kernel: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
Okt 09 18:46:38 Desktop kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Okt 09 18:46:38 Desktop kernel: #PF: supervisor read access in kernel mode
Okt 09 18:46:38 Desktop kernel: #PF: error_code(0x0000) - not-present page
Okt 09 18:46:38 Desktop kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Okt 09 18:46:38 Desktop kernel: #PF: supervisor read access in kernel mode
Okt 09 18:46:38 Desktop kernel: #PF: error_code(0x0000) - not-present page
I know that this could or should be different issues but I’m not sure what is causing which crashes and/or reboots, but I got indicators for both the CPU and GPU being the problem. So forgive me for putting everything in one thread. I also want to notice that the GPU is having the notorious reset bug, which might be causing the system to not wake up correctly from hibernation.
And finally here’s my inxi:
inxi -Fazy
System:
Kernel: 5.6.14-arch1-1-fsync x86_64 bits: 64 compiler: gcc v: 10.2.0
parameters: BOOT_IMAGE=/boot/vmlinuz-linux-fsync
root=UUID=e5ae83f7-b83c-4ba1-9165-c526522f90bc rw quiet
cryptdevice=UUID=14ab5635-177f-4285-a1bc-67a452dca803:luks-14ab5635-177f-4285-a1bc-67a452dca803
root=/dev/mapper/luks-14ab5635-177f-4285-a1bc-67a452dca803 apparmor=1
security=apparmor
resume=/dev/mapper/luks-17da867d-acdb-4759-9a6b-6435d17897a8
udev.log_priority=3
Desktop: KDE Plasma 5.20.1 tk: Qt 5.15.1 wm: kwin_x11 dm: SDDM
Distro: Manjaro Linux
Machine:
Type: Desktop Mobo: Micro-Star model: B450 TOMAHAWK MAX (MS-7C02) v: 1.0
serial: <filter> UEFI: American Megatrends v: 3.70 date: 06/09/2020
CPU:
Info: 6-Core model: AMD Ryzen 5 3600 bits: 64 type: MT MCP arch: Zen 2
family: 17 (23) model-id: 71 (113) stepping: N/A microcode: 8701021
L2 cache: 3072 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
bogomips: 86424
Speed: 3597 MHz min/max: 2200/3600 MHz boost: enabled Core speeds (MHz):
1: 3597 2: 2055 3: 2200 4: 2200 5: 2229 6: 2200 7: 2200 8: 2200 9: 3515
10: 2057 11: 2397 12: 2200
Vulnerabilities: Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: spec_store_bypass
mitigation: Speculative Store Bypass disabled via prctl and seccomp
Type: spectre_v1
mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP:
conditional, RSB filling
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
vendor: Micro-Star MSI driver: amdgpu v: kernel bus ID: 28:00.0
chip ID: 1002:731f
Display: x11 server: X.Org 1.20.9 compositor: kwin_x11 driver: amdgpu
display ID: :0 screens: 1
Screen-1: 0 s-res: 2560x1440 s-dpi: 96 s-size: 677x381mm (26.7x15.0")
s-diag: 777mm (30.6")
Monitor-1: HDMI-A-0 res: 2560x1440 hz: 60 dpi: 118
size: 553x311mm (21.8x12.2") diag: 634mm (25")
OpenGL: renderer: AMD Radeon RX 5700 (NAVI10 DRM 3.36.0 5.6.14-arch1-1-fsync
LLVM 10.0.1)
v: 4.6 Mesa 20.2.1 direct render: Yes
Audio:
Device-1: AMD Navi 10 HDMI Audio driver: snd_hda_intel v: kernel
bus ID: 28:00.1 chip ID: 1002:ab38
Device-2: AMD Starship/Matisse HD Audio vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel bus ID: 2a:00.4 chip ID: 1022:1487
Device-3: Yamaha type: USB driver: snd-usb-audio bus ID: 3-2.1:3
chip ID: 0499:170f
Sound Server: ALSA v: k5.6.14-arch1-1-fsync
Network:
Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
vendor: Micro-Star MSI driver: r8169 v: kernel port: f000 bus ID: 22:00.0
chip ID: 10ec:8168
IF: enp34s0 state: up speed: 100 Mbps duplex: full mac: <filter>
Device-2: Microsoft Xbox 360 Wireless Adapter type: USB driver: usbfs
bus ID: 1-8:4 chip ID: 045e:0719 serial: <filter>
Drives:
Local Storage: total: 9.46 TiB used: 6.22 TiB (65.7%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO 250GB size: 232.89 GiB
block size: physical: 512 B logical: 512 B speed: 6.0 Gb/s serial: <filter>
rev: 2B6Q scheme: GPT
ID-2: /dev/sdb vendor: Samsung model: SSD 840 EVO 120GB size: 111.79 GiB
block size: physical: 512 B logical: 512 B speed: 6.0 Gb/s serial: <filter>
rev: DB6Q scheme: GPT
ID-3: /dev/sdc vendor: Seagate model: ST4000DM004-2CV104 size: 3.64 TiB
block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
rotation: 5425 rpm serial: <filter> rev: 0001 scheme: GPT
ID-4: /dev/sdd vendor: Western Digital model: WD30EZRX-00D8PB0
size: 2.73 TiB block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
rotation: 5400 rpm serial: <filter> rev: 0A80 scheme: GPT
ID-5: /dev/sde vendor: Western Digital model: WD10EADS-00M2B0
size: 931.51 GiB block size: physical: 512 B logical: 512 B speed: 3.0 Gb/s
serial: <filter> rev: 0A01 scheme: GPT
ID-6: /dev/sdf vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
rotation: 7200 rpm serial: <filter> rev: 0001 scheme: GPT
ID-7: /dev/sdg type: USB vendor: SanDisk model: Cruzer Slice size: 29.82 GiB
block size: physical: 512 B logical: 512 B serial: <filter> rev: 1.20
scheme: MBR
SMART Message: Unknown USB bridge. Flash drive/Unsupported enclosure?
Partition:
ID-1: / raw size: 215.37 GiB size: 210.99 GiB (97.97%)
used: 65.08 GiB (30.8%) fs: ext4 dev: /dev/dm-0
Swap:
Kernel: swappiness: 60 (default) cache pressure: 100 (default)
ID-1: swap-1 type: partition size: 17.21 GiB used: 0 KiB (0.0%) priority: -2
dev: /dev/dm-1
Sensors:
System Temperatures: cpu: 42.8 C mobo: N/A gpu: amdgpu temp: 50.0 C
mem: 50.0 C
Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info:
Processes: 334 Uptime: 16m Memory: 15.65 GiB used: 3.32 GiB (21.2%)
Init: systemd v: 246 Compilers: gcc: 10.2.0 Packages: pacman: 1392 lib: 381
flatpak: 0 Shell: Bash v: 5.0.18 running in: konsole inxi: 3.1.08