Manjaro/Arch crashing on Specific Motherboard

Hi all, been working through an issue for months now and not sure where to go past what I’ve already done so I thought I’d reach out here for further info and guidance.

So this all starts quite a while back, probably around February or March of this year, when I was still on vanilla Arch. After an update occurred at some point (unsure when exactly) my system would start crashing at random, seemingly without any trace of what was causing it. I would have done a rollback to whatever the older kernel or software package that was causing this was but I hadn’t used the system in a long, long time… I did what seemed like the best course of action and started looking into systemd and kernel logs to see if I could pinpoint anything in there that could help make sense of things, but all the logs seemed to come back with very different and varied leads that went nowhere. So, I was like “alright, maybe it’s some weird hardware issue” and I attempted to do what I could to isolate out hardware to find the issue. I swapped a different CPU into my motherboard, I tried a different GPU (since I suspected Nvidia at some point), I tried spare RAM, I tried different storage drives, and nothing. Maybe the issue is my arch install? So I moved over to a clean install of Manjaro and the exact same problems as on arch…

Eventually I kind of just gave up, and relegated myself to a setup on an older motherboard and slower CPU until I could figure out the issue further. I sent all my parts in under warranty to be looked at and after testing by another few techs they couldn’t replicate the issue within Windows…

The system didn’t crash in windows, OK, good lead… That seems to suggest it’s a linux issue. I’ve since put the motherboard and CPU back in the system and continue to have the same issues with crashing within Manjaro. As a test I installed Ubuntu and have been using it for a few days at this point without any crashes so I’m really starting to suspect it may be something like a kernel patch that either arch or Manjaro uses that’s causing interference with… something on my MOBO. As for how I could test that, I don’t even know where to start. And maybe I’m barking up the wrong tree here, I dunno.

I’m happy to provide any logs and info that is needed to help work out the issue, just let me know what you need and I’ll get it to you.

System Specs:
CPU: Ryzen 5 3600X
MOBO: Gigabyte B450 Aorus M
RAM: Corsair Vengance something @ 3200mhz
GPU: 1070

Are you overclocking?

hastebin - inxi -Yaml output - Potentially offensive content ahead! - journalctl -b 1 (to pull up details of the last boot when the system crashed)

No, not at all.

No data shown at hastebin link to ‘journalctl’

Motherboard firmware is shown in inxi date - version: F50 date: 11/27/2019
I suggest you update firmware to latest version F60c
B450 AORUS M (rev. 1.0) Support | GIGABYTE Global

You may be right that it is an issue with Manjaro and a patch applied here and not on other OS.

But, my crystal ball tells me if you load your XMP profile for the RAM in the BIOS and then change the tRC to 72 (instead of 54 by default, right?) you’ll not crash anymore. Leave all other values to their XMP defaults.

I had multiple computers and similar CPU/motherboard/RAM kit on hands (mine and friends’) and after testing everything I could, reinstalling and all I found that the RAM was the issue (of course last thing I thoroughly tested like an idiot when it was obviously one of the first simple thing to test), I had the RAM sent back and same issue with new kit, I eventually bought a CRUCIAL kit, and all went fine. I compared the XMP profiles of both kits, and the Corsair kit, despite having proper values, doesn’t work properly with this setup, and the difference was the tRC in the CRUCIAL kit which is higher than the ‘normal’ value but which is what makes the difference. I changed this value of my friends computers and no issue at all anymore since a year.

You can verify you have RAM issue with Mprime, install it, run in terminal /usr/bin/mprime (do not join the GIMPS program just do benchmark, use all cores), and do a Large FFTs test and see if it spits errors after a few loops (try at least 10 minutes). If you see an error just stop you should never have any error.

Worth a try :wink:

//EDIT: I don’t think updating you BIOS with the one recently released would fix anything, I would recommend to try just my simple setting first to see if it fixes it for you. Then you could consider updating the BIOS as anyway it is always good to be up to date but do one step at a time to be sure of what fixed it when you’ll eventually get there.

//EDIT2: be sure to have a live USB on hands in case updating the BIOS breaks your Grub. I had to chroot from live USB and reinstall Grub following the WIKI, a second time after reinstalling it from normal system boot to fix the boothole exploit (fixed in BIOS F51), after I updated the BIOS with the new Grub version installed. So beware of that and also that updating the BIOS will reset every setting like secure boot CSM and all… like the RAM settings.

//EDIT3: I updated my B450M-DS3H motherboard BIOS from F51 to F60c and yeah, I had to chroot and reinstall GRUB… as the manjaro entry wasn’t listed anymore I only had “UEFI OS” or my drive name, but both wouldn’t boot.
Also my new BIOS has issues currently, the same talked here Error message during boot: do_IRQ: 1.55 No irq handler for vector / Newbie Corner / Arch Linux Forums so be careful pretty sure the latest BIOS for your motherboard may also introduce this bug as we have similar boards. I could avoid the emergency message by disabling IOMMU in the BIOS though temporarily, opened a ticket at Gigabyte.

Any news?

Ah, sorry for not coming back to reply earlier. Yes, that seems to have done it! I have decided against updating the BIOS for now but my system’s been stable for the entire 11 days since I read your post. Thanks so much for the recommendation!

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.