There is another scenario where CPU binning and AGESA have created a lottery where processor idle handling drops lower (voltage/clock) than can keep the CPU stable and throw an MCE… i.e. under-clocking/volting too far for your silicon (binning). Originally learned @ Ryzen - ArchWiki under “Random Reboots” in the “Troubleshooting” section.
You can read about the fun I’ve had @ System auto-rebooted... mce: [Hardware Error] in dmesg related to CPU … and the Cole’s notes about what BIOS options I’ve played with (after updating to AGESA 1.2.0.2):
- Power Idle Control = Typical current idle
- Curve Optimizer = +4 on all cores (noting if +4 doesn’t yield good results, step up slowly/incrementally to +8 if required)
- “Global C-State Control = AUTO” => Disabled
- switching IOMMU from “AUTO” => “enabled"
Since applying the +4 CO on all cores I had an MCE error once that was related to ECC memory (Not sure why I got an MCE error related to ECC when my RAM is not ECC) which I don’t have. So I’m thinking I might undo that and try AGESA 1.2.0.3b… but am leery about beta AGESA versions (letter suffix after the version number). Odd thing is, I’ve only had a total of 3 MCE errors in 7-8 months (including the ECC one that did not reboot my system)… so troubleshooting it is a bit of a needle in a haystack… made more fun by the forever evolving kernels and AMD microcode.
A recent MSI thread I read that I think supports the under-clocking/volting idea is @ https://forum-en.msi.com/index.php?threads/am4-troubleshooting-faq-thread.288349/
Expand for quote
Problem : When setting the core voltage or overclocking the CPU, once in Windows/Linux the CPU appears to get stuck in a lower power state?
Solution : There really isn’t one at this time. The best we can suggest doing is leaving the VCore set to AUTO, and setting your multiplier to 37 or 38 to overclock for now.
The general consensus is that the issue is a problem with the AGESA code and some lots of CPU’s and that’s what causes the problem. Why do I think this? Because users have switched boards/brands, kept the same CPU, and the problem followed the CPU to the new board. That tells me it’s not just an “MSI” problem, and that it’s also related to the CPU as well since those same boards with a different CPU also will run at the rated and expected speeds.
EDIT: It appears that the most recent BIOS’s for most boards should have a fix in them to fix the CPU underclocking issue right now…Just as an FYI.
I think the EDIT in the quote may be talking about the new 1.2.0.5 AGESA, but I’ve also noticed MSI moderators talking about avoiding AGESA 1.2.0.5 BIOS’s until AMD fixes AGESA 1.2.0.5 issues… however, one MSI moderator commented that some users are having issues with AGESA 1.2.0.5 and recommended flashing back to an older BIOS @ problem with MSI MEG X570 Unify after BIOS update | MSI Global English Forum
Expand for quote
There seems to be a Problem with 1.2.0.5 for some users can you please roll back to BIOS 7C35vAA AGESA 1.2.0.3b
Some user felt that they had great experience with the “AB3” beta BIOS which also had AGESA 1.2.0.3b:
I can have a try, but I’m really sure, that the BIOS version 7C35vAA will work like the BIOS version 7C35vAB3 too, because the 7C35vAB3 use ComboAM4PIV2 1.2.0.3c. And with the BIOS version 7C35vAB3 I do not have any issues.
But you can be right, because BIOS versions 7C35vAB and 7C35vAC1 switched to ComboAm4v2PI 1.2.0.5, like you can see in the attachments with the official change logs from MSI.
So honestly, I think the fix comes at some point after AMD delivers whatever fixes AGESA 1.2.0.5.