Hardware error coinciding with last update

Hello everyone!

Pretty much since the last update I seem to be getting crashes and started seeing hardware errors.
Granted my machine is 7 years old and scheduled for replacement, I’d still need it working for another 6 months. It has been running perfectly for the last 16 months so this is a new thing.

Last crash happened at night while the computer was idle.
Here is the error:

22.01.21 11:53	kernel	smpboot: CPU0: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (family: 0x6, model: 0x3c, stepping: 0x3)
22.01.21 11:53	kernel	mce: [Hardware Error]: Machine check events logged
22.01.21 11:53	kernel	mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: f200000000800400
22.01.21 11:53	kernel	mce: [Hardware Error]: TSC 0 
22.01.21 11:53	kernel	mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 0 microcode 28
22.01.21 11:53	kernel	Performance Events: PEBS fmt2+, Haswell events, 16-deep LBR, full-width counters, Intel PMU driver.
22.01.21 11:53	kernel	... version:                3
22.01.21 11:53	kernel	... bit width:              48
22.01.21 11:53	kernel	... generic registers:      4
22.01.21 11:53	kernel	... value mask:             0000ffffffffffff
22.01.21 11:53	kernel	... max period:             00007fffffffffff
22.01.21 11:53	kernel	... fixed-purpose events:   3
22.01.21 11:53	kernel	... event mask:             000000070000000f
22.01.21 11:53	kernel	rcu: Hierarchical SRCU implementation.
22.01.21 11:53	kernel	NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
22.01.21 11:53	kernel	smp: Bringing up secondary CPUs ...
22.01.21 11:53	kernel	x86: Booting SMP configuration:
22.01.21 11:53	kernel	.... node  #0, CPUs:      #1
22.01.21 11:53	kernel	mce: [Hardware Error]: Machine check events logged
22.01.21 11:53	kernel	mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: f200000000800400
22.01.21 11:53	kernel	mce: [Hardware Error]: TSC 0 
22.01.21 11:53	kernel	mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 2 microcode 28
22.01.21 11:53	kernel	 #2
22.01.21 11:53	kernel	mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 3: fe00000000800400
22.01.21 11:53	kernel	mce: [Hardware Error]: TSC 0 ADDR ffffffffc1c2525e MISC ffffffffc1c2525e 
22.01.21 11:53	kernel	mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 4 microcode 28
22.01.21 11:53	kernel	 #3
22.01.21 11:53	kernel	mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: fe00000000800400
22.01.21 11:53	kernel	mce: [Hardware Error]: TSC 0 ADDR ffffffff8ff948a1 MISC ffffffff8ff948a1 
22.01.21 11:53	kernel	mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 6 microcode 28
22.01.21 11:53	kernel	 #4
22.01.21 11:53	kernel	MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
22.01.21 11:53	kernel	 #5 #6 #7
22.01.21 11:53	kernel	smp: Brought up 1 node, 8 CPUs
22.01.21 11:53	kernel	smpboot: Max logical packages: 1
22.01.21 11:53	kernel	smpboot: Total of 8 processors activated (56019.80 BogoMIPS)
22.01.21 11:53	kernel	devtmpfs: initialized

I did a bit of digging and the errors started the day after I did the last update which was around midnight before Jan 20th.

journalctl -p emerg    
-- Journal begins at Fri 2020-12-25 05:04:21 CET, ends at Fri 2021-01-22 13:01:49 CET. --
Jän 20 13:40:11 cheetah kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400
Jän 20 13:40:11 cheetah kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff9992b212 MISC ffffffff9992b212 
Jän 20 13:40:11 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611146408 SOCKET 0 APIC 0 microcode 28
Jän 20 13:40:11 cheetah kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: be00000000800400
Jän 20 13:40:11 cheetah kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff9992b212 MISC ffffffff9992b212 
Jän 20 13:40:11 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611146408 SOCKET 0 APIC 6 microcode 28
-- Boot 5d00180b84de4a24a54fd1d1a038f2a8 --
Jän 20 15:42:47 cheetah kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400
Jän 20 15:42:47 cheetah kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff90a957a0 MISC ffffffff90a957a0 
Jän 20 15:42:47 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611153763 SOCKET 0 APIC 0 microcode 28
Jän 20 15:42:47 cheetah kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: be00000000800400
Jän 20 15:42:47 cheetah kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff90a957a0 MISC ffffffff90a957a0 
Jän 20 15:42:47 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611153763 SOCKET 0 APIC 6 microcode 28
-- Boot dc7caea2c28946349ec4d476e0aa63c9 --
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: f200000000800400
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: TSC 0 
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 0 microcode 28
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: f200000000800400
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: TSC 0 
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 2 microcode 28
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 3: fe00000000800400
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffffc1c2525e MISC ffffffffc1c2525e 
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 4 microcode 28
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: fe00000000800400
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff8ff948a1 MISC ffffffff8ff948a1 
Jän 22 11:53:00 cheetah kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1611312778 SOCKET 0 APIC 6 microcode 28

Is there anyone that could give me tips on how to tackle this issue?

Thank you,
Beer

I have no idea - but an option could be to boot the system using a live usb.

Then chroot and depending on running kernel switch to e.g. 5.4.

You could also try removing the intel-ucode package - don’t forget to run mkinitcpio if you choose that path.

1 Like

Thank you for the tips.
I am not sure switching to 5.4 is a good option since I just got my hands on a new AMD graphics card, but I’ll try removing the intel-ucode package.

Thanks!