Unsolicited reboots after upgrading nvidia-open-dkms from 575.64-2 to 575.64.03

I’ve got a weird one. After upgrading nvidia-open-dkms from 575.64-2 to 575.64.03 I suddenly started getting unsolicited reboots. Nothing in logs (I was tailing journalctl, and checked after a reboot). My system rebooted randomly between 15 minutes to 1.5h from bootup on 575.64.03 with 6.15.6-1-MANJARO. I was also using lqx kernel, and there I got complete system hang, to the point of using power button to kill the machine. It’s a Lenovo Legion 5, with AMD Ryzen 7 5800H and NVIDIA GA104M [GeForce RTX 3070 Mobile / Max-Q]. Reverting to 575.64-2 fixed the issue.

Check if similar issues got reported: 575 release feedback & discussion - Linux - NVIDIA Developer Forums

Try nvidia-dkms, without open module.

I’ve reverted the linux-firmware-* package changes as well, but the reboots didn’t go away. Interestingly I noticed that if I keep the screens (I have a laptop + external monitor) on and unlocked, my laptop works fine for hours on end. But if I let them lock and turn off, then between 1h-3h the reboot happens. I’m on AC power, so my laptop doesn’t attempt to sleep, just turns off the displays. I’ll keep investigating, but thanks for your suggestions. I’ll try the nvidia-dkms next, in case the open kernel modules might be interfearing somehow with the recent changes to KDE.

After a lot of back and forth with my packages, I managed to pin down the issue to my root nvme being 95% full (still ~90GB of free space though) that was causing the reboots and hangs. When I freed up some space, so that I have more than 90%, the freezes and reboots stopped. Annoying but not caused by nvidia this time, which is a win I suppose.

If you are using btrfs, then it pays to regularly run a btrfs balance operation on your filesystem to free up more space. It physically relocates data and/or metadata between different chunks.

A chunk is a 1-GiB region of drive space for data, or a 256-MiB region for metadata — see the documentation.