Failed to initialze NVIDIA kernel module error

I’ve not had any serious problems until today. I went in to my Manjaro partition and it won’t boot - basically a black screen. It complains about a Fatal server error/ no screens found.

I tried to restore from a BTRFS snapshot but none are working either. I then went to the terminal and updated the databases and tried to reinstall the nvidia drivers but there is no change.

Reading from the terminal the /var/log/Xorg.0.log error states:
NVIDIA: Failed to initialize the NVIDIA kernel module.

As advised elsewhere in the forum I updated using pacman (sudo pacman -Sy) and reinstalled the NVIDIA driver (sudo pacman -S nvidia). There were no issues however it did not solve the boot problem. I also tried (sudo pacman-mirrors -f3) and startx at the prompt.

Updating grub (sudo update-grub) doesn’t seem to find the Manjaro partition but booting from the live disk does see the Manjaro grub entry.

As someone else suggested I booted with the live usb entered:
su
manjaro-chroot -a (at this point it doesn’t report a Manjaro grub entry either)
pacman -Syu grub (but then error: pacman: command not found)

I don’t see a repair option on the live USB so I am at a loss now on how to get back the partition.

Thanks.

you boot on USB live manjaro,
open a terminal and a browser on this topic and return

sudo manjarco-chroot -a 
mhwd -li
mhwd-kernel -li 
ls /boot
ls /lib/modules 
inxi -Fza
ls /etc/mkinitcpio.d/*.preset
exit ( end-chroot )

Results so far:

sudo manjaro-chroot -a                                         1 ✘ 
grub-probe: error: cannot find a GRUB drive for /dev/sdb1.  Check your device.map.
grub-probe: error: cannot find a GRUB drive for /dev/sdb1.  Check your device.map.
==> Mounting (Fedora) [/dev/sda3]
 --> mount: [/mnt]
 --> mount: [/mnt/boot/efi]
mount: /mnt/etc/resolv.conf: mount point is a symbolic link to nowhere.
[root@manjaro-gnome /]# mhwd -li
bash: mhwd: command not found
[root@manjaro-gnome /]#

Note that Manjaro is on /dev/sda2.

Since I can’t chroot into Manjaro I ran the suggested commands from CL and copied from photos. I don’t know if this helpful but here are the results.

mhwd -li :
Installed PCI configs
Name: video-nvidia Version: 2021.12.18 Freedriver: false Type: PCI
Warning: No installed USB configs

mhwd-kernel -li:
Currently running 5.15.25-1-MANJARO (linux515)
Kernels installed : 5.13, 5.14,5.15, 5.15-rt

ls /boot:
efi initramfs-5.13-x86_64 .img initramfs-5.15-rt-x86_64-fallback.img initramfs-5.15-x86_64 .img linux514-x86_64 .kver memtest86+ vmlinuz-5.15-rt-x86_64
grub initramfs-5.14-x86_64-fallback .img initramfs-5.15-x86_64 .img intel-ucode.img linux-515-rt-x86_64.kver vmlinuz-5.13-x86_64 vmlinuz-5.15-x86_64 initramfs-5.13-x86_64-fallback.img initramfl-5.14-x86_64.img initramfs-5.15-x86_64-fallback.img linux513-x86_64.kver linux515-x86_64.kver vmlinuz-5.14-x86_64

ls /bin/modules:
No such file or directory

inxi -Fza:
[Too much to copy. Need to narrow what is important ]

ls /etc/mkinitcpio.d/*.preset:
/etc/mkinitcpio.d/linux513.preset /etc/mkinitcpio.d/linux514.preset /etc/mkinitcpio.d/linux515.preset /etc/mkinitcpio.d/linux515-rt.preset

first , boot on your system
open a TTY ( Ctrl + Alt + Fn ( F1 to F8 )

  • remove kernels EOL
sudo mhwd-kernel -r linux513
sudo mhwd-kernel -r linux514
  • regenerate all for linux515
sudo mkinitcpio -P
sudo update-grub

then reboot

stephane,

Thanks for that follow-up. Everything was done successfully, however, the system remains in the same state. When using startx from the prompt I am still getting xinit: server error, no screens found. Is there anything else I can try?

At this point I seem destined to reinstall. But this incident is disturbing since it was working fine when I last used it a week or two ago. I’d like to determine the cause so this isn’t repeated and I don’t lose my setup again, after-all that’s what I thought the btrfs snapshots would have prevented.

Update:
Wow, I was able to boot into a btrfs snapshot from March 6 and I am in the gui right now. It tells me I need to restore it to keep using this version which I need to find out how to proceed as it looks that the task is much more complex than I thought. I am wondering if there is anything else I should be doing while its running.

check version kernels installed

sudo mhwd-kernel -li

remove fisrt old version ( 5.13 & 5.14 )
then do update

I removed 5.13 and 5.14 (although 5.13 is still showing in grub) and updated. Kernel 5.15.28 was the first kernel listed in grub but it wouldn’t boot gui as before. 5.15.27 would boot so while I was in it I updated then I reinstalled Linux515 and Linux515-headers.

After performing the updates I can only boot to emergency mode with kernel5.15.27. System log (journalctl) complains about kernel modules and /boot/efi. The top grub entry, 5.15.28 is not bootable to gui as before.

Running sudo mkinitcpio -P again shows the following errors:
/lib/modules/5.13.19-2-MANJARO is not a valid kernel module directory
/lib/modules/5.15.27-1-MANJARO is not a valid kernel module directory

Grub is still finding remnants of 513:

Found linux image: /boot/vmlinuz-5.13-x86_64
Found initrd image: /boot/intel-ucode.img /boot/initramfs-5.13-x86_64.img
Found initrd fallback image: /boot/initramfs-5.13-x86_64-fallback.img

Some warnings are:
Possible missing firmware for module: bfa
Possible missing firmware for module: qed
Possible missing firmware for module: qla2xxx
Possible missing firmware for module: qla1280

can you report

sudo mhwd -li
sudo mhwd-kernel -li 
sudo ls /boot
sudo ls /lib/modules 
sudo ls /etc/mkinitcpio.d/*.preset

I managed to get in via snapshot again. Below are the answers.

sudo mhwd -li                                                    ✔ 

> Installed PCI configs:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
          video-nvidia            2021.12.18               false            PCI


Warning: No installed USB configs!
sudo mhwd-kernel -li                                      ✔  4s  
Currently running: 5.15.27-1-rt35-MANJARO (linux515)
The following kernels are installed in your system:
   * linux515
   * linux515-rt
sudo ls /boot                                                    ✔ 
efi				       intel-ucode.img
grub				       linux-515-rt-x86_64.kver
initramfs-5.13-x86_64-fallback.img     linux515-x86_64.kver
initramfs-5.13-x86_64.img	       memtest86+
initramfs-5.15-rt-x86_64-fallback.img  vmlinuz-5.13-x86_64
initramfs-5.15-rt-x86_64.img	       vmlinuz-5.15-rt-x86_64
initramfs-5.15-x86_64-fallback.img     vmlinuz-5.15-x86_64
initramfs-5.15-x86_64.img
sudo ls /lib/modules                                             ✔ 
5.14.10-1-MANJARO	5.15.28-1-MANJARO	   extramodules-5.15-rt-MANJARO
5.15.27-1-rt35-MANJARO	extramodules-5.15-MANJARO
sudo ls /etc/mkinitcpio.d/*.preset                               ✔ 
/etc/mkinitcpio.d/linux513.preset  /etc/mkinitcpio.d/linux515-rt.preset
/etc/mkinitcpio.d/linux515.preset

If you hawe “old” gpu you also may wish to keep old kernel&driver to avoid possible iusses
Right now i on 5.4 with nvida linux419-nvidia-470xx and it works fine (510 nvidia works badly if i remember corectly)

you will a have to suppress theses files

/boot:
vmlinuz-5.13-x86_64
initramfs-5.13-x86_64.img	
initramfs-5.13-x86_64-fallback.img

/lib/modules:
5.14.10-1-MANJARO


/etc/mkinitcpio.d/linux513.preset

then redo

sudo mkinitcpio -P
sudo update-grub

Thanks for the heads up concerning Nvidia. It is a recent card, RTX 2070 super. It seemed to be running fine until this incident occurred. I am not yet sure what the culprit is, NVIDIA, GRUB, BTRFS or Manjaro.

stephane,

Do you mean just delete them? Linux513, for example, is at least unwritable.

Yes you need to remove 5.13 5.14

I deleted the items listed by stephane and confirmed by straycat.

Current status is, however, as before, Manjaro won’t finish booting into either the first grub entry (a black screen) or the optional kernel 5.15.27-rt (boots into emergency mode). Incidentally kernel 513 is still listed in the grub list. I can continue to boot into a btrfs snapshot but that doesn’t seem like a long term solution. A reinstall seems like the only option left.