Nvidia kernel modules not loading at boot

Not sure if this was caused by an update, but nvidia modules are no longer loading on boot.
I’m having to switch to a TTY and manually load “nvidia-drm” to get to a xorg session

I’ve also already checked to make sure that the modules are not blacklisted I can post the contents of /etc/modprobe.d/ if needed.
as part of my debugging process, i’ve also installed nvidia-dkms (in case the module somehow got incompatible? even though i could load them)

I’ve linked below to pastebins of both my dmesg & journalctl outputs
journalctl - Nvidia issue - Pastebin.com
dmesg - dmesg for gog - Pastebin.com

Hi again @person1873,

Take a look here

https://wiki.manjaro.org/index.php/Configure_NVIDIA_(non-free)_settings_and_load_them_on_Startup

I have no idea if this will work or anything like that, just saw it for an unrelated reason, and thought it might apply to you.

Hope it helps!

@Mirdarthos I had installed the drivers initially this way, and they were working.
I’ve re-run the commands in that first section purely for interest sake and completion.

[jpycroft@ManjaroRGB ~]$ inxi -G
Graphics:
  Device-1: NVIDIA TU106 [GeForce RTX 2060 SUPER] driver: nvidia v: 495.46
  Device-2: Logitech C922 Pro Stream Webcam type: USB
    driver: snd-usb-audio,uvcvideo
  Display: x11 server: X.Org 1.21.1.3 driver: loaded: nvidia resolution:
    1: 1920x1080~60Hz 2: 1920x1080~60Hz
  OpenGL: renderer: NVIDIA GeForce RTX 2060 SUPER/PCIe/SSE2
    v: 4.6.0 NVIDIA 495.46
[jpycroft@ManjaroRGB ~]$ sudo mhwd -a pci nonfree 0300
[sudo] password for jpycroft: 
> Skipping already installed config 'video-nvidia' for device: 0000:10:00.0 (0300:10de:1f06) Display controller nVidia Corporation TU106 [GeForce RTX 2060 SUPER]
[jpycroft@ManjaroRGB ~]$ mhwd -li
> Installed PCI configs:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
           video-linux            2018.05.04                true            PCI
          video-nvidia            2021.12.18               false            PCI


Warning: No installed USB configs!

not sure if there’s any obvious issue here?

Post a full inxi so that we have a picture of your system, not just what you think is relevant. Edit your post with better inxi, for example:

inxi -Fazy

I already see some kind of issue, you have video-linux AND video-nvidia installed. Might not be a big issue, but not how I would have my system.

1 Like

video-linux was auto installed by mhwd, happy to try removing it if you think it’s related, however it was working correctly yesterday?

[jpycroft@ManjaroRGB ~]$ inxi -Fazy
System:
  Kernel: 5.16.2-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
    parameters: BOOT_IMAGE=/boot/vmlinuz-5.16-x86_64
    root=UUID=8eea8751-09bf-426e-b02c-049bb4b1f95d rw quiet apparmor=1
    security=apparmor udev.log_priority=3
  Desktop: Xfce 4.16.0 tk: Gtk 3.24.29 info: xfce4-panel wm: xfwm 4.16.1
    vt: 7 dm: LightDM 1.30.0 Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Desktop Mobo: Micro-Star model: B450 GAMING PRO CARBON AC (MS-7B85)
    v: 1.0 serial: <superuser required> UEFI: American Megatrends v: 1.C0
    date: 06/11/2020
CPU:
  Info: model: AMD Ryzen 5 2400G with Radeon Vega Graphics bits: 64
    type: MT MCP arch: Zen family: 0x17 (23) model-id: 0x11 (17) stepping: 0
    microcode: 0x8101016
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
    L1: 384 KiB desc: d-4x32 KiB; i-4x64 KiB L2: 2 MiB desc: 4x512 KiB L3: 4 MiB
    desc: 1x4 MiB
  Speed (MHz): avg: 1589 high: 1807 min/max: 1600/3600 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 1600 2: 1452
    3: 1543 4: 1583 5: 1807 6: 1596 7: 1548 8: 1590 bogomips: 57623
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP:
    disabled, RSB filling
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA TU106 [GeForce RTX 2060 SUPER] vendor: Gigabyte
    driver: nvidia v: 495.46 alternate: nouveau,nvidia_drm bus-ID: 10:00.0
    chip-ID: 10de:1f06 class-ID: 0300
  Device-2: Logitech C922 Pro Stream Webcam type: USB
    driver: snd-usb-audio,uvcvideo bus-ID: 3-9:5 chip-ID: 046d:085c
    class-ID: 0102 serial: <filter>
  Display: x11 server: X.Org 1.21.1.3 compositor: xfwm4 v: 4.16.1 driver:
    loaded: nvidia display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x286mm (40.0x11.3")
    s-diag: 1055mm (41.6")
  Monitor-1: HDMI-0 res: 1920x1080 hz: 60 dpi: 90
    size: 544x303mm (21.4x11.9") diag: 623mm (24.5")
  Monitor-2: DP-5 res: 1920x1080 hz: 60 dpi: 90 size: 544x303mm (21.4x11.9")
    diag: 623mm (24.5")
  OpenGL: renderer: NVIDIA GeForce RTX 2060 SUPER/PCIe/SSE2
    v: 4.6.0 NVIDIA 495.46 direct render: Yes
Audio:
  Device-1: NVIDIA TU106 High Definition Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel bus-ID: 10:00.1 chip-ID: 10de:10f9
    class-ID: 0403
  Device-2: AMD Family 17h HD Audio vendor: Micro-Star MSI
    driver: snd_hda_intel v: kernel bus-ID: 27:00.6 chip-ID: 1022:15e3
    class-ID: 0403
  Device-3: Logitech C922 Pro Stream Webcam type: USB
    driver: snd-usb-audio,uvcvideo bus-ID: 3-9:5 chip-ID: 046d:085c
    class-ID: 0102 serial: <filter>
  Device-4: Focusrite-Novation Scarlett Solo USB type: USB
    driver: snd-usb-audio bus-ID: 7-1.1:3 chip-ID: 1235:8205 class-ID: 0102
  Device-5: C-Media Blue Snowball type: USB
    driver: hid-generic,snd-usb-audio,usbhid bus-ID: 7-1.2:4 chip-ID: 0d8c:0005
    class-ID: 0300 serial: <filter>
  Device-6: Logitech G560 Gaming Speaker type: USB
    driver: hid-generic,snd-usb-audio,usbhid bus-ID: 7-1.3.1:6
    chip-ID: 046d:0a78 class-ID: 0300 serial: <filter>
  Sound Server-1: ALSA v: k5.16.2-1-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.20 running: no
  Sound Server-3: PulseAudio v: 15.0 running: yes
  Sound Server-4: PipeWire v: 0.3.43 running: no
Network:
  Device-1: Intel Wireless-AC 9260 driver: iwlwifi v: kernel bus-ID: 21:00.0
    chip-ID: 8086:2526 class-ID: 0280
  IF: wlo1 state: down mac: <filter>
  Device-2: Intel I211 Gigabit Network vendor: Micro-Star MSI driver: igb
    v: kernel port: e000 bus-ID: 22:00.0 chip-ID: 8086:1539 class-ID: 0200
  IF: enp34s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Intel Wireless-AC 9260 Bluetooth Adapter type: USB driver: btusb
    v: 0.8 bus-ID: 5-3:3 chip-ID: 8087:0025 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
  Local Storage: total: 2.55 TiB used: 213.05 GiB (8.2%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Intel model: SSDPEKKW256G8
    size: 238.47 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 type: SSD serial: <filter> rev: 004C temp: 35.9 C scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 vendor: Seagate model: ST2000LM007-1R8174
    size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    type: HDD rpm: 5400 serial: <filter> rev: EB01 scheme: GPT
  ID-3: /dev/sdb maj-min: 8:16 vendor: Samsung model: MZHPU512HCGL-000H1
    size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 2H1Q scheme: GPT
  ID-4: /dev/sdc maj-min: 8:32 type: USB vendor: SanDisk model: USB 3.2Gen1
    size: 28.64 GiB block-size: physical: 512 B logical: 512 B type: N/A
    serial: <filter> rev: 1.00 scheme: MBR
Partition:
  ID-1: / raw-size: 238.17 GiB size: 233.38 GiB (97.99%)
    used: 213.05 GiB (91.3%) fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 288 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: nvidia temp: 44 C
  Fan Speeds (RPM): N/A gpu: nvidia fan: 56%
Info:
  Processes: 299 Uptime: 1m wakeups: 0 Memory: 31.36 GiB used: 2.41 GiB (7.7%)
  Init: systemd v: 250 tool: systemctl Compilers: gcc: 11.1.0 clang: 13.0.0
  Packages: pacman: 1511 lib: 473 flatpak: 0 Shell: Bash v: 5.1.16
  running-in: xfce4-terminal inxi: 3.3.12
[jpycroft@ManjaroRGB ~]$ 

Yes, if you installed the system while booting with free drivers, then yes, that happened. Then you installed video-nvidia without removing video-linux … then you went on the dkms road that was unnecessary.

Remove video-linux and force reinstall of video-nvidia.

sudo mhwd -r pci video-linux

sudo mhwd -f -i pci video-nvidia

@bogdancovaciu
should these be run from a TTY? or is a reboot enough after running them?

You can do them from terminal if you have acces to the UI, if not, then from TTY, and yes, reboot after that.
Then we can check if in /etc/modprobe.d/mhwd-gpu.conf you have:

blacklist nouveau
blacklist ttm
blacklist drm_kms_helper
blacklist drm

and if in /etc/modules-load.d/mhwd-gpu.conf are this lines:

nvidia
nvidia-drm

@bogdancovaciu I’ve done as you suggested and removed video-linux & reinstalled video-nvidia
for posterity, could you edit the “-F” to “-f” as this is the correct flag :slight_smile:

unfortunately i’m still having the same issue

[jpycroft@ManjaroRGB ~]$ cat /etc/modprobe.d/*
##
## Generated by mhwd - Manjaro Hardware Detection
##
 
blacklist nouveau
blacklist ttm
blacklist drm_kms_helper
blacklist drm
[jpycroft@ManjaroRGB ~]$ ^C
[jpycroft@ManjaroRGB ~]$ cat /etc/modules-load.d/*
##
## Generated by mhwd - Manjaro Hardware Detection
##
 
nvidia
nvidia-drm
# List of modules to load at boot

EDIT: forgot how copy/paste works #facepalm

1 Like

Hahaha, did i got carried out by something there ? Possible … fixed now!

@bogdancovaciu I feel these line from the journal logs are relevant
EDIT: I need to learn to english

Feb 02 18:52:25 ManjaroRGB kernel: audit: type=1400 audit(1643799145.061:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=320 comm="apparmor_pa>
Feb 02 18:52:25 ManjaroRGB kernel: audit: type=1400 audit(1643799145.061:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=320 comm="appar>
Feb 02 18:52:25 ManjaroRGB audit[320]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=320 comm="apparmor_parser"
Feb 02 18:52:25 ManjaroRGB audit[320]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=320 comm="apparmor_parser"
Feb 02 18:52:24 ManjaroRGB systemd-modules-load[306]: Module 'nvidia' is deny-listed
Feb 02 18:52:24 ManjaroRGB systemd-modules-load[306]: Module 'nvidia_drm' is deny-listed
Feb 02 18:52:24 ManjaroRGB systemd-modules-load[306]: Module 'nvidia_uvm' is deny-listed

This is odd … Please share:

cat /etc/default/grub
cat /etc/mkinitcpio.conf

Also, wonder if you have some custom stuff in /etc/X11/ aside the proper /etc/X11/mhwd.d/nvidia.conf

Maybe worth running:
sudo mhwd-gpu --setmod nvidia --setxorg /etc/X11/mhwd.d/nvidia.conf

Nothing odd/related in grub config

[jpycroft@ManjaroRGB ~]$ cat /etc/default/grub
GRUB_DEFAULT=saved
GRUB_TIMEOUT=5
GRUB_TIMEOUT_STYLE=hidden
GRUB_DISTRIBUTOR="Manjaro"
GRUB_CMDLINE_LINUX_DEFAULT="quiet apparmor=1 security=apparmor udev.log_priority=3"
GRUB_CMDLINE_LINUX=""

# If you want to enable the save default function, uncomment the following
# line, and set GRUB_DEFAULT to saved.
GRUB_SAVEDEFAULT=true

# Uncomment to disable submenus in boot menu
#GRUB_DISABLE_SUBMENU=y

# Preload both GPT and MBR modules so that they are not missed
GRUB_PRELOAD_MODULES="part_gpt part_msdos"

# Uncomment to enable booting from LUKS encrypted devices
#GRUB_ENABLE_CRYPTODISK=y

# Uncomment to use basic console
GRUB_TERMINAL_INPUT=console

# Uncomment to disable graphical terminal
#GRUB_TERMINAL_OUTPUT=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command 'videoinfo'
GRUB_GFXMODE=auto

# Uncomment to allow the kernel use the same resolution used by grub
GRUB_GFXPAYLOAD_LINUX=keep

# Uncomment if you want GRUB to pass to the Linux kernel the old parameter
# format "root=/dev/xxx" instead of "root=/dev/disk/by-uuid/xxx"
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY=true

# Uncomment this option to enable os-prober execution in the grub-mkconfig command
GRUB_DISABLE_OS_PROBER=false

# Uncomment and set to the desired menu colors.  Used by normal and wallpaper
# modes only.  Entries specified as foreground/background.
GRUB_COLOR_NORMAL="light-gray/black"
GRUB_COLOR_HIGHLIGHT="green/black"

# Uncomment one of them for the gfx desired, a image background or a gfxtheme
#GRUB_BACKGROUND="/usr/share/grub/background.png"
GRUB_THEME="/usr/share/grub/themes/manjaro/theme.txt"

# Uncomment to get a beep at GRUB start
#GRUB_INIT_TUNE="480 440 1"

# Uncomment to ensure that the root filesystem is mounted read-only so that
# systemd-fsck can run the check automatically. We use 'fsck' by default, which
# needs 'rw' as boot parameter, to avoid delay in boot-time. 'fsck' needs to be
# removed from 'mkinitcpio.conf' to make 'systemd-fsck' work.
# See also Arch-Wiki: https://wiki.archlinux.org/index.php/Fsck#Boot_time_checking
#GRUB_ROOT_FS_RO=true
[jpycroft@ManjaroRGB ~]$

@gog on the #manjaro IRC suggested i add MODULES=“nvidia nvidia-modeset”
This didn’t change anything as far as logs etc

[jpycroft@ManjaroRGB ~]$ cat /etc/mkinitcpio.conf 
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES=(piix ide_disk reiserfs)
MODULES="nvidia nvidia-modeset"

# BINARIES
# This setting includes any additional binaries a given user may
# wish into the CPIO image.  This is run last, so it may be used to
# override the actual binaries included by a given hook
# BINARIES are dependency parsed, so you may safely ignore libraries
BINARIES=()

# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way.  This is useful for config files.
FILES=""

# HOOKS
# This is the most important setting in this file.  The HOOKS control the
# modules and scripts added to the image, and what happens at boot time.
# Order is important, and it is recommended that you do not change the
# order in which HOOKS are added.  Run 'mkinitcpio -H <hook name>' for
# help on a given hook.
# 'base' is _required_ unless you know precisely what you are doing.
# 'udev' is _required_ in order to automatically load modules
# 'filesystems' is _required_ unless you specify your fs modules in MODULES
# Examples:
##   This setup specifies all modules in the MODULES setting above.
##   No raid, lvm2, or encrypted root is needed.
#    HOOKS=(base)
#
##   This setup will autodetect all modules for your system and should
##   work as a sane default
#    HOOKS=(base udev autodetect block filesystems)
#
##   This setup will generate a 'full' image which supports most systems.
##   No autodetection is done.
#    HOOKS=(base udev block filesystems)
#
##   This setup assembles a pata mdadm array with an encrypted root FS.
##   Note: See 'mkinitcpio -H mdadm' for more information on raid devices.
#    HOOKS=(base udev block mdadm encrypt filesystems)
#
##   This setup loads an lvm2 volume group on a usb device.
#    HOOKS=(base udev block lvm2 filesystems)
#
##   NOTE: If you have /usr on a separate partition, you MUST include the
#    usr, fsck and shutdown hooks.
HOOKS="base udev autodetect modconf block keyboard keymap consolefont filesystems fsck"

# COMPRESSION
# Use this to compress the initramfs image. By default, gzip compression
# is used. Use 'cat' to create an uncompressed image.
#COMPRESSION="gzip"
#COMPRESSION="bzip2"
#COMPRESSION="lzma"
#COMPRESSION="xz"
#COMPRESSION="lzop"
#COMPRESSION="lz4"
#COMPRESSION="zstd"

# COMPRESSION_OPTIONS
# Additional options for the compressor
#COMPRESSION_OPTIONS=()
[jpycroft@ManjaroRGB ~]$ 

I haven’t touched /etc/X11/ apart from to look since install about a week ago.

[jpycroft@ManjaroRGB ~]$ tree /etc/X11/
/etc/X11/
├── mhwd.d
│   ├── nvidia.conf
│   └── nvidia.conf.nvidia-xconfig-original
├── xinit
│   ├── xinitrc
│   ├── xinitrc.d
│   │   ├── 40-libcanberra-gtk-module.sh
│   │   ├── 50-systemd-user.sh
│   │   └── 80xapp-gtk3-module.sh
│   └── xserverrc
└── xorg.conf.d
    ├── 00-keyboard.conf
    └── 90-mhwd.conf -> /etc/X11/mhwd.d/nvidia.conf

4 directories, 9 files
[jpycroft@ManjaroRGB ~]$ 

This made no difference…

I’m wondering if systemd-modules-load.service is finding another blacklist somewhere
https://www.freedesktop.org/software/systemd/man/systemd-modules-load.service.html

systemd-modules-load.service is an early boot service that loads kernel modules. It reads static configuration from files in /usr/ and /etc/, but also runtime configuration from /run/ and the kernel command line (see below).
See modules-load.d(5) for information about the configuration format of this service and paths where configuration files can be created.

but that should be

MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)

inside the /etc/mkinitcpio.conf if you want to enable early kms, then you have to run
sudo mkinitcpio -P
sudo update-grub

But then, if you want for later to test wayland (not worth tho), then you add to /etc/default/grub on the cmd line to be like that

GRUB_CMDLINE_LINUX="nvidia-drm.modeset=1"

then run
sudo update-grub

but, where this comes from

is something i never encountered … :man_shrugging:

It depends on nvidia, nvidia_modeset & nvidia_uvm, so it might be denied because it’s dependencies are denied?
Also, do you know of anything in /usr that systemd might be reading?

I’ll try your mkinitcpio stuff.
I’m not interested in wayland tbh, too buggy and i don’t see the advantage?

Probably it is blacklisted somewhere.

What is content of modprobe folder? ls /usr/lib/modprobe.d/ and ls /etc/modprobe.d/

1 Like

Since @person1873 tested the nvidia-dkms thing, i wonder if that is still present in the system and it creates some custom stuff, but i honestly have no clue. I always avoided dkms stuff …

Similar issue: Manjaro does not boot after update - #18 by dirn

[jpycroft@ManjaroRGB ~]$ tree /etc/modprobe.d/
/etc/modprobe.d/
└── mhwd-gpu.conf

0 directories, 1 file
[jpycroft@ManjaroRGB ~]$ tree /usr/lib/modprobe.d/
/usr/lib/modprobe.d/
├── bluetooth-usb.conf
├── bumblebee.conf
├── nvdimm-security.conf
├── nvidia-utils.conf
├── README
├── systemd.conf
└── uvesafb.conf

0 directories, 7 files
[jpycroft@ManjaroRGB ~]$ tree /usr/lib/modules-load.d/
/usr/lib/modules-load.d/
├── bluez.conf
├── nvidia-utils.conf
└── uinput.conf

0 directories, 3 files