Nvidia kernel failure after update

Updated Manjaro to the latest today for the first time in a couple of months. I didn’t watch the update as it was 3-4Gb of downloads etc, but at the end it seemed to be ok. After reboot, Xorg no longer loads.

I’ve put some logs up on pastebin. From googling, the problem seems to be that it can’t find the nvidia or nvidia-drm modules as shown in the Journal logs. But I’m not sure how best to fix it without making it a lot worse. I’ve already tried a mkinitcpio -P without any success.

https://pastebin.com/1d7mXyh1

Any ideas?

Try to install the previous driver.

sudo mhwd -r pci video-hybrid-intel-nvidia-prime
sudo mhwd -i pci video-hybrid-intel-nvidia-470xx-prime

Output of mhwd -l and mhwd -li could be also relevant as well as mhwd-kernel -li

//EDIT: or you could try forcing reinstall of Nvidia hybrid driver sudo mhwd -f -i pci video-hybrid-intel-nvidia-prime

//EDIT2: give the output of any command you try so we can see what is going on.

>> mhwd -li
> Installed PCI configs:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
          video-nvidia            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI


Warning: No installed USB configs!
>>> Installed PCI configs:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
          video-nvidia            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI


Warning: No installed USB configs!


>> mhwd -l
> 0000:01:00.0 (0300:10de:1382) Display controller nVidia Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
          video-nvidia            2021.12.18               false            PCI
    video-nvidia-470xx            2021.12.18               false            PCI
    video-nvidia-390xx            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI
     video-modesetting            2020.01.13                true            PCI
            video-vesa            2017.03.12                true            PCI


> 0000:04:00.0 (0300:10de:1382) Display controller nVidia Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
          video-nvidia            2021.12.18               false            PCI
    video-nvidia-470xx            2021.12.18               false            PCI
    video-nvidia-390xx            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI
     video-modesetting            2020.01.13                true            PCI
            video-vesa            2017.03.12                true            PCI


> 0000:00:02.0 (0380:8086:1912) Display controller Intel Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
           video-linux            2018.05.04                true            PCI


So you have a hybrid system but the Intel card isn’t seen in MHWD? :thinking: //EDIT: it is, at the bottom.

Then try that:

sudo mhwd -r pci video-linux
sudo mhwd -f -i pci video-nvidia

//EDIT: OK you have two Nvidia cards so I guess that’s why it is not as expected to me. So I’m not sure if you should remove the video-linux driver…

Originally I used the Intel card, but when I moved to multiple displays I changed it to the Nvidia. Its years ago now, so I can’t remember if I disabled it or not. I later added a second Nvidia card to try get a third display working again after I had to replace the monitor and it had a different connector type. Never got that working, but thats another days work.

In the interim, here’s some additional output. I’ll try the video-nvidia install at least and see how it goes

>> mhwd-kernel -li
Currently running: 5.15.19-2-rt29-MANJARO (linux515)
The following kernels are installed in your system:
   * linux510
   * linux515
   * linux54
   * linux515-rt
>> inxi -Fazy
System:
  Kernel: 5.15.19-2-rt29-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
    parameters: BOOT_IMAGE=/boot/vmlinuz-5.15-rt-x86_64
    root=UUID=2e06d737-e3af-4be7-8d4f-d3b19bd603c0 rw quiet apparmor=1
    security=apparmor udev.log_priority=3
  Desktop: N/A dm: SDDM Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Desktop System: Dell product: OptiPlex 7040 v: N/A
    serial: <superuser required> Chassis: type: 3 serial: <superuser required>
  Mobo: Dell model: 0Y7WYT v: A00 serial: <superuser required>
    UEFI-[Legacy]: Dell v: 1.18.1 date: 12/23/2020
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Solar Keyboard K750
    serial: <filter> charge: 100% rechargeable: yes status: N/A
CPU:
  Info: model: Intel Core i7-6700 bits: 64 type: MT MCP arch: Skylake-S
    family: 6 model-id: 0x5E (94) stepping: 3 microcode: 0xEA
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
    L3: 8 MiB desc: 1x8 MiB
  Speed (MHz): avg: 900 high: 901 min/max: 800/4000 scaling:
    driver: intel_pstate governor: powersave cores: 1: 900 2: 900 3: 900 4: 900
    5: 900 6: 900 7: 901 8: 900 bogomips: 54398
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf
    mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl and seccomp
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional,
    IBRS_FW, STIBP: conditional, RSB filling
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort mitigation: TSX disabled
Graphics:
  Device-1: Intel HD Graphics 530 vendor: Dell driver: i915 v: kernel
    bus-ID: 00:02.0 chip-ID: 8086:1912 class-ID: 0380
  Device-2: NVIDIA GM107 [GeForce GTX 745] driver: N/A alternate: nouveau
    bus-ID: 01:00.0 chip-ID: 10de:1382 class-ID: 0300
  Device-3: NVIDIA GM107 [GeForce GTX 745] driver: N/A alternate: nouveau
    bus-ID: 04:00.0 chip-ID: 10de:1382 class-ID: 0300
  Display: server: X.org 1.21.1.3 driver: loaded: N/A failed: nvidia
  Message: No advanced graphics data found on this system.
Audio:
  Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Dell
    driver: snd_hda_intel v: kernel bus-ID: 00:1f.3 chip-ID: 8086:a170
    class-ID: 0403
  Device-2: NVIDIA GM107 High Definition Audio [GeForce 940MX]
    driver: snd_hda_intel v: kernel bus-ID: 01:00.1 chip-ID: 10de:0fbc
    class-ID: 0403
  Device-3: NVIDIA GM107 High Definition Audio [GeForce 940MX]
    driver: snd_hda_intel v: kernel bus-ID: 04:00.1 chip-ID: 10de:0fbc
    class-ID: 0403
  Sound Server-1: ALSA v: k5.15.19-2-rt29-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.20 running: no
  Sound Server-3: PulseAudio v: 15.0 running: no
  Sound Server-4: PipeWire v: 0.3.45 running: no
Network:
  Device-1: Intel Ethernet I219-LM vendor: Dell driver: e1000e v: kernel
    port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15b7 class-ID: 0200
  IF: enp0s31f6 state: up speed: 1000 Mbps duplex: full mac: <filter>
RAID:
  Hardware-1: Intel SATA Controller [RAID mode] driver: ahci v: 3.0 port: f060
    bus-ID: 00:17.0 chip-ID: 8086:2822 rev: N/A class-ID: 0104
Drives:
  Local Storage: total: 931.52 GiB used: 72.86 GiB (7.8%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/sda maj-min: 8:0 vendor: SanDisk model: SSD PLUS 1000GB
    size: 931.52 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 00RL scheme: GPT
Partition:
  ID-1: / raw-size: 902.22 GiB size: 887.06 GiB (98.32%)
    used: 72.86 GiB (8.2%) fs: ext4 dev: /dev/sda2 maj-min: 8:2
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: partition size: 29.3 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/sda1 maj-min: 8:1
Sensors:
  System Temperatures: cpu: 29.8 C pch: 50.5 C mobo: 27.8 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 187 Uptime: 1h 7m wakeups: 19 Memory: 31.24 GiB
  used: 932.4 MiB (2.9%) Init: systemd v: 250 tool: systemctl Compilers:
  gcc: 11.1.0 clang: 13.0.1 Packages: 1786 pacman: 1734 lib: 456 flatpak: 5
  snap: 47 Shell: Bash v: 5.1.16 running-in: sshd (SSH) inxi: 3.3.12
>>

>> sudo mhwd -f -i pci video-nvidia
[sudo] password for stevo:
> Removing video-nvidia...
Sourcing /etc/mhwd-x86_64.conf
Has lib32 support: true
Sourcing /var/lib/mhwd/local/pci/video-nvidia/MHWDCONFIG
Processing classid: 0300
Sourcing /var/lib/mhwd/scripts/include/0300
Processing classid: 0302
checking dependencies...
error: failed to prepare transaction (could not satisfy dependencies)
:: removing libxnvctrl breaks dependency 'libxnvctrl' required by conky-lua-nv
Error: pacman failed!
Error: script failed!

Yeah just try to force reinstall the driver

sudo mhwd -f -i pci video-nvidia

if it doesn’t work, remove it then try previous driver.

sudo mhwd -r pci video-nvidia
sudo mhwd -i pci video-nvidia-470xx

//EDIT:

:: removing libxnvctrl breaks dependency 'libxnvctrl' required by conky-lua-nv

You will need to remove conky-lua-nv in order to be able to install drivers. You could reinstall it afterwards.

>> sudo mhwd -f -i pci video-nvidia
> Removing video-nvidia...
Sourcing /etc/mhwd-x86_64.conf
Has lib32 support: true
Sourcing /var/lib/mhwd/local/pci/video-nvidia/MHWDCONFIG
Processing classid: 0300
Sourcing /var/lib/mhwd/scripts/include/0300
Processing classid: 0302
checking dependencies...
:: ffmpeg optionally requires nvidia-utils: Nvidia NVDEC/NVENC support
:: gst-plugins-bad optionally requires nvidia-utils: nvcodec plugin
:: lib32-vulkan-icd-loader optionally requires lib32-vulkan-driver: packaged vulkan driver
:: steam-manjaro optionally requires vulkan-driver: packaged vulkan driver
:: steam-manjaro optionally requires lib32-vulkan-driver: packaged vulkan driver (32bit)
:: vulkan-icd-loader optionally requires vulkan-driver: packaged vulkan driver
warning: dependency cycle detected:
warning: eglexternalplatform will be removed after its nvidia-utils dependency

Packages (8) egl-wayland-2:1.1.9+r3+g582b2d3-1  eglexternalplatform-1.1-2  lib32-nvidia-utils-510.47.03-1  libxnvctrl-510.47.03-1  linux510-nvidia-510.47.03-3  linux515-nvidia-510.47.03-4  linux54-nvidia-510.47.03-3  nvidia-utils-510.47.03-4

Total Removed Size:  690.40 MiB

:: Do you want to remove these packages? [Y/n]
:: Processing package changes...
removing linux54-nvidia...
removing linux515-nvidia...
removing linux510-nvidia...
removing lib32-nvidia-utils...
xorg configuration symlink valid...
removing nvidia-utils...
xorg configuration symlink valid...
removing egl-wayland...
removing eglexternalplatform...
removing libxnvctrl...
:: Running post-transaction hooks...
(1/7) Reloading system manager configuration...
(2/7) Reloading device manager configuration...
(3/7) Arming ConditionNeedsUpdate...
(4/7) Updating module dependencies...
(5/7) Reloading system bus configuration...
(6/7) Updating the desktop file MIME type cache...
(7/7) Updating mlocate database
'/etc/X11/xorg.conf.d/90-mhwd.conf' symlink is invalid! Removing it...
> Successfully removed video-nvidia
> Installing video-nvidia...
Sourcing /etc/mhwd-x86_64.conf
Has lib32 support: true
Sourcing /var/lib/mhwd/db/pci/graphic_drivers/nvidia/MHWDCONFIG
Processing classid: 0300
Sourcing /var/lib/mhwd/scripts/include/0300
Processing classid: 0302
:: Synchronizing package databases...
 core downloading...
 extra downloading...
 community downloading...
 multilib downloading...
resolving dependencies...
looking for conflicting packages...
warning: dependency cycle detected:
warning: eglexternalplatform will be installed before its nvidia-utils dependency

Packages (8) egl-wayland-2:1.1.9+r3+g582b2d3-1  eglexternalplatform-1.1-2  lib32-nvidia-utils-510.47.03-1  libxnvctrl-510.47.03-1  linux510-nvidia-510.47.03-3  linux515-nvidia-510.47.03-4  linux54-nvidia-510.47.03-3  nvidia-utils-510.47.03-4

Total Installed Size:  690.40 MiB

:: Proceed with installation? [Y/n]
:: Retrieving packages...
 nvidia-utils-510.47.03-4-x86_64 downloading...
 linux515-nvidia-510.47.03-4-x86_64 downloading...
 linux510-nvidia-510.47.03-3-x86_64 downloading...
 linux54-nvidia-510.47.03-3-x86_64 downloading...
 lib32-nvidia-utils-510.47.03-1-x86_64 downloading...
 libxnvctrl-510.47.03-1-x86_64 downloading...
 egl-wayland-2:1.1.9+r3+g582b2d3-1-x86_64 downloading...
 eglexternalplatform-1.1-2-any downloading...
checking keyring...
checking package integrity...
loading package files...
checking for file conflicts...
error: failed to commit transaction (conflicting files)
lib32-nvidia-utils: /usr/lib32/libEGL_nvidia.so.0 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libGLESv1_CM_nvidia.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libGLESv2_nvidia.so.2 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libGLX_nvidia.so.0 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libcuda.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libnvcuvid.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libnvidia-encode.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libnvidia-fbc.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libnvidia-ml.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libnvidia-opticalflow.so.1 exists in filesystem
lib32-nvidia-utils: /usr/lib32/libnvidia-ptxjitcompiler.so.1 exists in filesystem
Errors occurred, no packages were upgraded.
Error: pacman failed!
Error: script failed!
>> sudo mhwd -r pci video-nvidia
Error: config 'video-nvidia' is not installed!
>> sudo mhwd -i pci video-nvidia-470xx
> Installing video-nvidia-470xx...
Sourcing /etc/mhwd-x86_64.conf
Has lib32 support: true
Sourcing /var/lib/mhwd/db/pci/graphic_drivers/nvidia-470xx/MHWDCONFIG
Processing classid: 0300
Sourcing /var/lib/mhwd/scripts/include/0300
Processing classid: 0302
:: Synchronizing package databases...
 core downloading...
 extra downloading...
 community downloading...
 multilib downloading...
resolving dependencies...
looking for conflicting packages...
warning: dependency cycle detected:
warning: eglexternalplatform will be installed before its nvidia-470xx-utils dependency

Packages (8) egl-wayland-2:1.1.9+r3+g582b2d3-1  eglexternalplatform-1.1-2  lib32-nvidia-470xx-utils-470.103.01-1  libxnvctrl-470xx-470.103.01-1  linux510-nvidia-470xx-470.103.01-4  linux515-nvidia-470xx-470.103.01-4  linux54-nvidia-470xx-470.103.01-4  nvidia-470xx-utils-470.103.01-1

Total Download Size:   322.21 MiB
Total Installed Size:  660.86 MiB

:: Proceed with installation? [Y/n]
:: Retrieving packages...
 nvidia-470xx-utils-470.103.01-1-x86_64 downloading...
 linux515-nvidia-470xx-470.103.01-4-x86_64 downloading...
 lib32-nvidia-470xx-utils-470.103.01-1-x86_64 downloading...
 linux510-nvidia-470xx-470.103.01-4-x86_64 downloading...
 linux54-nvidia-470xx-470.103.01-4-x86_64 downloading...
 libxnvctrl-470xx-470.103.01-1-x86_64 downloading...
checking keyring...
checking package integrity...
loading package files...
checking for file conflicts...
error: failed to commit transaction (conflicting files)
lib32-nvidia-470xx-utils: /usr/lib32/libEGL_nvidia.so.0 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libGLESv1_CM_nvidia.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libGLESv2_nvidia.so.2 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libGLX_nvidia.so.0 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libcuda.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvcuvid.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvidia-encode.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvidia-fbc.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvidia-ifr.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvidia-ml.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvidia-opticalflow.so.1 exists in filesystem
lib32-nvidia-470xx-utils: /usr/lib32/libnvidia-ptxjitcompiler.so.1 exists in filesystem
Errors occurred, no packages were upgraded.
Error: pacman failed!
Error: script failed!
>>

Untracked files in the system, you used external driver installed manually now your system is a mess, am I guessing right?

Not today at least. Who knows possibly months or years ago.

So wipe and reinstall would be best?

No you could overwrite the files, but I’m not sure of the command with mhwd. Simple solution would be to manually delete these files with sudo after you uninstall the driver (which should be uninstalled). But at this point it is on you I can’t guarantee it is proper procedure, these files may be part of another package you installed that the system doesn’t “know”.

What I would do, but it is wrong, but I would still do it, for each file sudo rm bla/bla

sudo rm /usr/lib32/libEGL_nvidia.so.0
sudo rm /usr/lib32/libGLESv1_CM_nvidia.so.1
sudo rm /usr/lib32/libGLESv2_nvidia.so.2
sudo rm /usr/lib32/libGLX_nvidia.so.0
sudo rm /usr/lib32/libcuda.so.1
sudo rm /usr/lib32/libnvcuvid.so.1
sudo rm /usr/lib32/libnvidia-encode.so.1
sudo rm /usr/lib32/libnvidia-fbc.so.1
sudo rm /usr/lib32/libnvidia-ml.so.1
sudo rm /usr/lib32/libnvidia-opticalflow.so.1
sudo rm /usr/lib32/libnvidia-ptxjitcompiler.so.1

Remember this is not proper solution, you might break something, and there might be a better proper real solution.

So I deleted those files, did the install, and it looks like once installed it went into a mkcpioinit. That eventually completed but with the following text

linux54-nvidia-470xx: install reason has been set to 'explicitly installed'
rmmod: ERROR: could not remove 'nouveau': No such file or directory
rmmod: ERROR: could not remove module nouveau: No such file or directory
rmmod: ERROR: could not remove 'ttm': Resource temporarily unavailable
rmmod: ERROR: could not remove module ttm: Resource temporarily unavailable
rmmod: ERROR: could not remove 'drm_kms_helper': Resource temporarily unavailable
rmmod: ERROR: could not remove module drm_kms_helper: Resource temporarily unavailable
rmmod: ERROR: could not remove 'drm': Resource temporarily unavailable
rmmod: ERROR: could not remove module drm: Resource temporarily unavailable
modprobe: FATAL: Module nvidia not found in directory /lib/modules/5.15.19-2-rt29-MANJARO
xorg configuration file: '/etc/X11/mhwd.d/nvidia.conf'
modprobe: FATAL: Module nvidia-drm not found in directory /lib/modules/5.15.19-2-rt29-MANJARO
> Successfully installed video-nvidia-470xx

I then went for a reboot, but it doesn’t reboot fully and I can no longer get to a prompt via Ctrl-Alt-F2 and pinging from another device doesn’t work. Looks like its messed up beyond a level of time or effort I’m willing to spend on recovering it. I’d pretty much decided before I tried it that I was going to wipe it anyway since there looks to be a tonne of crud built up over the years, and there was nothing on it that I’m worried about losing.

Thanks for all the help. Really appreciate it despite how it ended.

i had a similar problem in former times and there is a swiss-knife method that worked all times.
a) make sure that the linuxXXX-headers are installed that are needed for your XXX-kernel to do a build
b) download the driver from nvidia-homepage
c) load into console / terminal (if needed load into level 3 at boot)
d) start the setup.sh from the nvidia-downlaod (but make sure to do a chmod +x to the downloaded files)
e) let the setup override the whole mess you left in the past, accept everything of the setup with y (yes)
d) reboot

I don’t think removing the remaining Nvidia files cause the issue you had when reinstalling the driver. Anyway if you decided to reinstall then do it and never install something outside of Manjaro repositories without understanding everything as a general rule.

Also you can always access your files from Manjaro live USB (and even work on the system from live USB with the help of manjaro-chroot, but I think you already decided to not continue working on it).

I will select your last post as solution unfortunately.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.