Easy diffusion (stable diffusion) doesn't detect GPU ("can't initalize nvml")

I have a laptop with a Geforce rtx 3050 mobile graphics card. I installed stable diffusion using their Easy Diffusion package. That way I can generate all kinds of funky images on my laptop.
I have no experience using these kind of applications, so I am running into new things here and need your help.

AFAIK the GPU can be used for computing, using cuda. My GPU matches the requirements. I did not pay much attention during the install, but I think I saw some Torch, conda and python packages being installed. I will try to find a log.

In the end I have a local server which I can access through a web UI, using localhost:9000. Unfortunately it does not recognize my GPU and instead wants to use the CPU, which is much slower. How can I make it see my Nvidia?

When I start Easy Diffusion it tells me this amongst others:

16:00:42.925 INFO MainThread started at 01/28/24 16:00:42                                                                                                  server.py:32
stable-diffusion model(s) found.
gfpgan model(s) found.
realesrgan model(s) found.
vae model(s) found.
/home/[username]/Downloads/easy-diffusion/installer_files/env/lib/python3.8/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
16:00:44.273 WARNING MainThread WARNING: Could not find a compatible GPU. Using the CPU, but this will be very slow!                               device_manager.py:56
16:00:44.274 INFO MainThread Start new Rendering Thread on device: cpu           

I looked up the script, but that doesn’t get my anywhere:

def _raw_device_count_nvml() -> int:
    """Return number of devices as reported by NVML
    or negative value if NVML discovery/initialization failed."""
    from ctypes import CDLL, c_int, byref
    nvml_h = CDLL("libnvidia-ml.so.1")
    rc = nvml_h.nvmlInit()
        warnings.warn("Can't initialize NVML")

Some information on my system:
mhwd -l -d --pci

--------------------------------------------------------------------------------
> PCI Device: /devices/pci0000:00/0000:00:06.0/0000:01:00.0 (0302:10de:25a2)
  Display controller nVidia Corporation GA107M [GeForce RTX 3050 Mobile]
--------------------------------------------------------------------------------
  > INSTALLED:

   NAME:        video-hybrid-intel-nvidia-prime
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Hybrid prime solution for NVIDIA Optimus Technology - Closed source NVIDIA driver & open source intel driver.
   PRIORITY:    8
   FREEDRIVER:  false
   DEPENDS:     video-modesetting 
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 0300 
   VENDORIDS:   10de 8086 

   NAME:        video-linux
   ATTACHED:    PCI
   VERSION:     2018.05.04
   INFO:        Standard open source drivers.
   PRIORITY:    2
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 0380 0302 
   VENDORIDS:   1002 8086 10de 



  > AVAILABLE:

   NAME:        video-hybrid-intel-nvidia-prime
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Hybrid prime solution for NVIDIA Optimus Technology - Closed source NVIDIA driver & open source intel driver.
   PRIORITY:    8
   FREEDRIVER:  false
   DEPENDS:     video-modesetting 
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 0300 
   VENDORIDS:   10de 8086 

   NAME:        video-hybrid-intel-nvidia-470xx-prime
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Hybrid prime solution for NVIDIA Optimus Technology - Closed source NVIDIA driver & open source intel driver.
   PRIORITY:    7
   FREEDRIVER:  false
   DEPENDS:     video-modesetting 
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 0300 
   VENDORIDS:   10de 8086 

   NAME:        video-nvidia
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Closed source NVIDIA drivers for linux.
   PRIORITY:    5
   FREEDRIVER:  false
   DEPENDS:     -
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 
   VENDORIDS:   10de 

   NAME:        video-nvidia-470xx
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Closed source NVIDIA drivers for linux.
   PRIORITY:    4
   FREEDRIVER:  false
   DEPENDS:     -
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 
   VENDORIDS:   10de 

   NAME:        video-linux
   ATTACHED:    PCI
   VERSION:     2018.05.04
   INFO:        Standard open source drivers.
   PRIORITY:    2
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 0380 0302 
   VENDORIDS:   1002 8086 10de 


--------------------------------------------------------------------------------
> PCI Device: /devices/pci0000:00/0000:00:02.0 (0300:8086:9a49)
  Display controller Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics]
--------------------------------------------------------------------------------
  > INSTALLED:

   NAME:        video-hybrid-intel-nvidia-prime
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Hybrid prime solution for NVIDIA Optimus Technology - Closed source NVIDIA driver & open source intel driver.
   PRIORITY:    8
   FREEDRIVER:  false
   DEPENDS:     video-modesetting 
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 0300 
   VENDORIDS:   10de 8086 

   NAME:        video-linux
   ATTACHED:    PCI
   VERSION:     2018.05.04
   INFO:        Standard open source drivers.
   PRIORITY:    2
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 0380 0302 
   VENDORIDS:   1002 8086 10de 

   NAME:        video-modesetting
   ATTACHED:    PCI
   VERSION:     2020.01.13
   INFO:        X.org modesetting video driver.
   PRIORITY:    1
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 
   VENDORIDS:   * 



  > AVAILABLE:

   NAME:        video-hybrid-intel-nvidia-prime
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Hybrid prime solution for NVIDIA Optimus Technology - Closed source NVIDIA driver & open source intel driver.
   PRIORITY:    8
   FREEDRIVER:  false
   DEPENDS:     video-modesetting 
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 0300 
   VENDORIDS:   10de 8086 

   NAME:        video-hybrid-intel-nvidia-470xx-prime
   ATTACHED:    PCI
   VERSION:     2023.03.23
   INFO:        Hybrid prime solution for NVIDIA Optimus Technology - Closed source NVIDIA driver & open source intel driver.
   PRIORITY:    7
   FREEDRIVER:  false
   DEPENDS:     video-modesetting 
   CONFLICTS:   video*nvidia* 
   CLASSIDS:    0300 0302 0300 
   VENDORIDS:   10de 8086 

   NAME:        video-linux
   ATTACHED:    PCI
   VERSION:     2018.05.04
   INFO:        Standard open source drivers.
   PRIORITY:    2
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 0380 0302 
   VENDORIDS:   1002 8086 10de 

   NAME:        video-modesetting
   ATTACHED:    PCI
   VERSION:     2020.01.13
   INFO:        X.org modesetting video driver.
   PRIORITY:    1
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 
   VENDORIDS:   * 

   NAME:        video-vesa
   ATTACHED:    PCI
   VERSION:     2017.03.12
   INFO:        X.org vesa video driver.
   PRIORITY:    0
   FREEDRIVER:  true
   DEPENDS:     -
   CONFLICTS:   -
   CLASSIDS:    0300 
   VENDORIDS:   * 

inxi -Fz

System:
  Kernel: 6.1.71-1-MANJARO arch: x86_64 bits: 64 Desktop: KDE Plasma
    v: 5.27.10 Distro: Manjaro Linux
Machine:
  Type: Laptop System: ASUSTeK product: Vivobook_ASUSLaptop X7600PC_N7600PC
    v: 1.0 serial: <superuser required>
  Mobo: ASUSTeK model: X7600PC v: 1.0 serial: <superuser required>
    UEFI: American Megatrends LLC. v: X7600PC.300 date: 11/08/2021
Battery:
  ID-1: BAT0 charge: 35.3 Wh (43.4%) condition: 81.4/96.0 Wh (84.8%)
    volts: 11.7 min: 11.7
CPU:
  Info: quad core model: 11th Gen Intel Core i7-11370H bits: 64 type: MT MCP
    cache: L2: 5 MiB
  Speed (MHz): avg: 649 min/max: 400/4800 cores: 1: 1074 2: 400 3: 1056
    4: 400 5: 400 6: 400 7: 400 8: 1069
Graphics:
  Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel
  Device-2: NVIDIA GA107M [GeForce RTX 3050 Mobile] driver: N/A
  Device-3: IMC Networks USB2.0 HD UVC WebCam driver: uvcvideo type: USB
  Display: x11 server: X.Org v: 21.1.10 driver: X: loaded: modesetting
    dri: iris gpu: i915 resolution: 2560x1600
  API: EGL v: 1.5 drivers: iris,swrast platforms: gbm,x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 23.3.3-manjaro1.1
    renderer: Mesa Intel Xe Graphics (TGL GT2)
  API: Vulkan v: 1.3.274 drivers: intel surfaces: xcb,xlib
Audio:
  Device-1: Intel Tiger Lake-LP Smart Sound Audio driver: snd_hda_intel
  API: ALSA v: k6.1.71-1-MANJARO status: kernel-api
  Server-1: PulseAudio v: 16.1 status: active
Network:
  Device-1: MEDIATEK MT7921 802.11ax PCI Express Wireless Network Adapter
    driver: mt7921e
  IF: wlo1 state: up mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: IMC Networks Wireless_Device driver: btusb type: USB
  Report: btmgmt ID: hci0 state: up address: <filter> bt-v: 5.2
Drives:
  Local Storage: total: 476.94 GiB used: 144.63 GiB (30.3%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: HFM512GD3JX013N
    size: 476.94 GiB
Partition:
  ID-1: / size: 150.74 GiB used: 42.89 GiB (28.5%) fs: ext4
    dev: /dev/nvme0n1p1
  ID-2: /boot/efi size: 487 MiB used: 312 KiB (0.1%) fs: vfat
    dev: /dev/nvme0n1p2
  ID-3: /home size: 286.23 GiB used: 101.7 GiB (35.5%) fs: ext4
    dev: /dev/nvme0n1p4
Swap:
  ID-1: swap-1 type: partition size: 31.25 GiB used: 39.2 MiB (0.1%)
    dev: /dev/nvme0n1p5
Sensors:
  System Temperatures: cpu: N/A mobo: N/A
  Fan Speeds (rpm): cpu: 2800
Info:
  Processes: 279 Uptime: 1h 24m Memory: total: 16 GiB note: est.
  available: 15.31 GiB used: 10.58 GiB (69.1%) Shell: Zsh inxi: 3.3.31

Tried updating to the latest drivers using mhwd -a pci nonfree 0300 , but apparently I already have what’s best. Tried to figure out if there is some way to ‘activate’ the nvidia gpu. Installed Optimus-manager, but that did not work at all… I would like to have the option to only use the nvidia is necessary, but the GPU should be recognised/initialised by Stable diffusion.

Thank you!

If I recall correct - something like

prime-run <application>

I Understand that would be the solution to force an application to use the Nvidia card. But in this case I have no clue what application name I should type. I start Stable Diffusion from a bash script. Running the script starting with prime-run does not solve my problem.
I wonder why Stable Diffusion doesn’t recognise my Nvidia card. And why Torch fails to initalize NVML.

Nvidia say NVML is:

A C-based API for monitoring and managing various states of the NVIDIA GPU devices. It provides a direct access to the queries and commands exposed via nvidia-smi.

When I run nvidia-smi:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I have no experience with hybrid setup.

I am thinking the reason a hybrid driver combo exist is to facilitate some relatively painless switch between iGPU and dGPU.

This leads to the thought that installing the drivers separately could be the solution - this is only an idea - with no means of testing.

The steps would be

mhwd -r pci video-hybrid-intel-nvidia-prime

Then

 mhwd -i pci video-linux
 mhwd -i pci video-nvidia

This of course will remove the option to use prime-run but I recall a topic from long ago someone stating it was enough to have the drivers et.al installed to be able to use CUDA, NVML and alike.

My hypothesis is - the hybrid driver somehow prevents the nvidia GPU from being activated outside prime-run.

1 Like

I was thinking in the same direction, but I have some reservations:

  1. How would I go about using the Nvidia card (automatically) when my CPU can’t handle it well?
  2. I just tried to run prime-run glxgears and it spit out some awful error:
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  152 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  26
  Current serial number in output stream:  27
  1. Looking in my logs I see the following errors at boot:
28-01-2024 17:04	systemd-modules-load	Failed to find module 'nvidia'
28-01-2024 17:04	systemd-modules-load	Failed to find module 'nvidia-drm'
28-01-2024 17:04	systemd-modules-load	Failed to find module 'nvidia-uvm'

So, I guess reinstalling the drivers is a wise idea as something isn’t going right here, but I have no clue what. I would like to understand a bit what’s happening before entering commands.

Ok, problem solved, hooray! Thanks to @linux-aarhus for helping me out!

Taking a look back at some stuff, there was definitely something wrong with my installation of the drivers as they were not loaded at boot, not available according to inxi and so on.

The solution consisted of deleting the drivers and reinstalling them through mwhd (or you cas use the hardware settings GUI as well, not surw how it’s called in English). I did an install of the same drivers, the hybrid version.

After uninstalling the drivers, I rebooted, I updated my kernel as well to 6.6.10, and rebooted. Then upon reinstalling the nvidia hybrid drivers Mhwd complained it could’t find linux60 drivers (? why, I was using the new 6.6 kernel?!). So, removed the 6.0 kernel (pacman -Qs linux60 ), rebooted again, and then the install finally worked.

And inxi -G shows that the nvidia driver was installed, nvidia-smi also worked:

nvidia-smi                                                                                                                                              ✔ 
Sun Jan 28 19:38:47 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0             752W /  35W |      8MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       597      G   /usr/lib/Xorg                                 4MiB |
+---------------------------------------------------------------------------------------+

One last tip: when I installed the Nvidia-drivers, it noticed this:

If you run into trouble with CUDA not being available, run nvidia-modprobe first.

It wasn’t necessary in my case, as Stable diffusion immediatly detected the GPU. It generated an image with the same prompt and variables in 8 seconds instead of 4 minutes when it had to use the CPU.

1 Like

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.