Computer bricked after adding BADRAM and grub-upgrade

I ran memtest86 and found that I had some faulty RAM (which I had started suspecting because I was observing frequent and very random Chrome crashes and sometimes other applications crashing, all things I had always assumed to be software bugs because what are the chances of actual hardware failure, right?).

So I edited /etc/idontremember/grub and added the GRUB_BADRAM directive that I copied and pasted from the memtest86 report as instructed in their instructions.
Then I ran “sudo update-grub”. This showed a few impossible-to-understand warnings but they didn’t seem serious. Unfortunately I didn’t save them but even if I had I couldn’t access them now that my computer is a brick.

After that, I rebooted and now all I get at startup is a black screen. No splash screen, no error message, absolutely nothing. I can tell that the screen turns on but it’s all black.

Is there a way to get my system to boot again?

Chroot and revert whatever you did.

Hi @php4fan,

You’ll probably find this useful:

And/or this:

And be careful what you do, pay attention, so that you don’t end up /etc/idontremember/grub doing youdontknowwhat.

undo - whatever you did …

Replace faulty RAM, then manjaro-chroot by booting an ISO stick and remove ‘GRUB_BADRAM’ kernel parameter from /etc/default/grub, afterwards run within chroot environment

# update-grub

and exit. Then reboot normally.

2 Likes

Do not create a backup in chroot and do not overwrite good version of another backup on another hard drive if you are using the faulty RAM, otherwise the backup will be corrupted due to the bad RAM!

Remove the faulty RAM immediately!

It would have been advantageous to use btrfs because btrfs quickly notices with its checksums if the data in RAM changes before / after the data has been written to disk.

The faulty RAM would have been noticed quickly.

:footprints:

Can I chroot from another Linux distro that is not Manjaro? I have an old usb stick with a live OpenSuse Tumbleweed

The faulty RAM would have been noticed quickly.

This deviates a bit from the original question but I am interested: Let’s say I used btrfs with the faulty RAM. What exactly would I notice?

Just to clarify, I have been using my computer with faulty RAM (without knowing it, obviously) for months if not years. I’ve been working, installing updates, making backups, etc.

If reverting the changes (without replacing the ram) actually gets the machine to boot again (which I seriously hope it will), it will mean that the BADRAM directive is what is currently preventing the system from booting, which would be quite ironic.

You can chroot from any distro you want, just search how.

You will need to replace that faulty RAM - your system will never work reliably otherwise

Since you call the device a ‘computer’ I’m assuming you have access to the RAM. How much RAM in how many modules do you have? If it’s =/> 2x8GB I’d start here:
Warning: Always disconnect computer from mains and wait 1min before removing/adding modules.

  • re-seat the modules and carefully clean module contacts
  • power up and re-run memtest, if error’s gone:
  • chroot and reinstall grub

If error persists:

  • identify faulty module (run memtest with only one module installed)
  • remove faulty module
  • boot live usb with single good module installed
  • chroot and reinstall grub
  • add replacement module later

You don’t really know this, maybe connections corroded, maybe someone bumped into the computer, maybe your power supply is degrading or a combination thereof.

Please have a look at:

Pleas have also a look at:
https://btrfs.readthedocs.io/en/latest/Hardware.html

Btrfs collects data in RAM, which is then to be written collectively to disk. Checksums are generated and stored in RAM.

When the data is then read from disk again, the checksums are checked. If they don’t match, the file system immediately becomes READONLY (you’ll notice that quickly :wink: )

So if only 1 bit was changed → READONLY

:footprints:

btrfs-desktop-notification-git notifies you on any desktop envorinment when a corrupted file appears in dmesg’s warning level.

Another question is also, if the Ram-Module really is faulty or is your default Bios/UEFI-Settings are terrible for the Ram-Module?

Sometimes you run into issue because you using 4 modules…
but RAM issues could also be RAM Voltage that are to low or to high.

The RAM Access settings are to fast and needs to be reduced.

Thank you everybody!

I was able to chroot and undo the changes and I can now boot normally (yes, with the faulty RAM, I can’t replace it until monday. I know it’s dangerous, but I’ve been using my computer with the faulty ram without knowing it for months).

I found a USB stick with Manjaro that I had forgot I had, and as you suggested I was able to:

  • boot from the live USB
  • chroot
  • undo the changes in /etc/default/grub (i.e. remove the BADRAM line I had added)
  • run update-grub

and now it boots.

Now I want to understand why the BADRAM directive in Grub prevents the system from booting.

These were the instructions generated by memtest86 pro which I followed:

Copy & paste the following faulty memory ranges to /etc/default/grub:
GRUB_BADRAM="0x0000001548E6000,0xFFFFFFFFFFFFE000,0x0000001568E6000,0xFFFFFFFFFFFFE000,0x0000001548E2000,0xFFFFFFFFFFFFF000,0x000000123E17000,0xFFFFFFFFFFFFF000"
Open and terminal and run the following command
sudo update-grub
Warning: Masking faulty memory addresses does not fix defective RAM, but provides a temporary workaround for allowing the system to boot. Please consult the FAQ for more details

That doesn’t say where in the file to paste it, so I pasted it at the very end. That is the only thing that I can think of that I can have done wrong. Was that the mistake?

BTW let’s just assume that it’s not because of some random error caused by the faulty RAM itself. (if that is the case, what are the chances that booting from USB, chrooting and restoring grub configuration would all have gone well?) As soon as I get to replace it, I will try the same again with the healthy RAM, and I’m ready to bet it’ll brick the computer again. Until then let’s work under that assumption. (I’m writing from the faulty RAM laptop right now)

EDIT: here’s at least 2 threads (in other forums) where people had the exact same experience, i.e. adding BADRAM directive in grub led to inability to boot:
https://unix.stackexchange.com/questions/746164/grub-hangs-itself-with-64bit-memtest86-badram-pattern

(the first offers a speculative explanation but I don’t quite understand it)

IF there is faulty RAM
which I don’t think has been confirmed
there is the possibility of taking out the faulty module and still having enough (half of it) left.
The system will still run.

I had missed this post, sorry.

This is a 3-year-old Lenovo laptop, the RAM modules are the ones that came with it (I didn’t add or replace any) and I never tinkered with any Bios/UEFI settings. Given all that, do you think wrong settings are a real possibility? That would mean serious mistakes were made at Lenovo. (this is a genuine question, no sarcasm intended)

No problem.

My experiences with Laptops are very limited, i had expected that you talking about a normal PC.

I heared that on alot Laptops the RAM’s may soldered today. i hope this is not the case for you.

As far i know, my Laptop BIOS won’t let me change RAM or CPU Voltage, i just dislike the limited adjustments with so many Laptops.

Anyways, in general there is a silicon lottery and RAM/CPU/GPU manufactured with differend quality, some good quality chips runs with less Voltage and some bad quality Chips needs higher Voltage.

Specially for RAM’s you can see in alot BIOS Versions, that the Changelogs include RAM Stability adjustments.

Its all possible, i personally had increase my voltage on my RAM Modules with my PC/Mainboard, because the XMP Settings was always not reliable, even after several Bios updates that promised to fix RAM Stability issues… so i had to fix the Voltage on my own.

This topic was automatically closed 3 hours after the last reply. New replies are no longer allowed.