I have failed to locate any sort of manual for what might be going awry here and I am not familiar enough with nvidia drivers to hazard any sort of guess. I would appreciate suggestions for next steps.
I think thats the difference. nvidia-smi is reporting the cuda ādriverā version contained in the nvidia packages.
Which is different than the cuda āruntimeā obtained via the cuda package.
Interesting, I investigated these other packages and they both seem to be up to date,
pacman -Q nvidia
linux613-nvidia 550.135-0.2
pacman -Q opencl-nvidia
opencl-nvidia 550.135-1
So it seems to me like everything should be fine, alas no such luckā¦
The frustrating thing is a binary Iām running is demanding cuda 12.6, so itās the ādriverā version it wants. It couldnāt possibly be that nvidia shipped their driver
with the wrong version of cuda, right? That seems preposterous.
Lucky you I suppose. One thing is certain, Iām not buying their cards in the future if they canāt get their act together.
Iāve been looking into this as well, and I think I know what is going on, but am basing this on the things I managed to google yesterday, so I might very well be wrong.
tl;dr: I believe the cause of this issue is that the GPU drivers are installed by mhwd and are relatively old, whereas the package cuda (with the CUDA runtime API) is installed through the Arch package, and requires a newer driver than the one supplied by mhwd.
Summarizing a stackoverflow answer that I canāt link to because Iām new (try the normal URL + a/53504578/12762884):
Your system can have 2 different CUDA versions:
the driver API, which is installed by the GPU driver. (On Manjaro, youāre supposed to install this with mhwd.) This is the version you see with nvidia-smi, because that program comes from the GPU driver.
the runtime API, which is part of the CUDA toolkit (I think?), and this is what you can see for example with nvcc --version. On Manjaro, that is what the cuda package is installing.
If the version reported by nvidia-smi is at least the version reported by nvcc, then youāre fine; however, if the version reported by nvidia-smi is lower, then (the post says) that is a broken config.
Iām on the up-to-date Manjaro NVIDIA driver, and have also installed the latest cuda package, and have versions similar to the OP:
niels@niels-manjaro-desktop-2411:~ā nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
niels@niels-manjaro-desktop-2411:~ā nvidia-smi | head
Mon Dec 9 15:24:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.135 Driver Version: 550.135 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3070 Off | 00000000:01:00.0 On | N/A |
| 0% 40C P8 11W / 270W | 1674MiB / 8192MiB | 1% Default |
According to the stack overflow answer, this is a broken setup, and I believe this is possibly Manjaroās fault (but happy to be proven wrong here!).
I think the way us Manjaro users are ending up with this wrong setup is that:
the GPU driver and CUDA driver API are installed using mhwd, which is on a relatively older version of the NVIDIA driver; while at the same time
the CUDA runtime API is installed using the cuda package from Arch Linux. This package mentions in line 10 in its PKGBUILD (again I canāt link to it ) that a newer driver is required than what mhwd supplies. (@Yochanan somehow you have a newer driver though? How did you install it?) See also issue #7 on the packageās gitlab (again, sorry, I canāt link).; it mentions that they had a similar issue earlier (the cuda package was running ahead of the Arch nvidia drivers).
As for fixes, it would be great if the cuda package somehow checked for the available drivers. (I personally have no idea how this could be done.) As a short-term fix, I think we could also downgrade our cuda package (havenāt tried it yet).
Hi,
Thank you for the insightful replies, I did some snooping and concluded something similar to @Nielius. Something is out of order with the stable branch nvidia driver. To resolve my issue i simply switched branches to unstable and installed the unstable branch Nvidia driver 565.77 which has cuda 12.7 support, then i switched back to stable. Be careful when doing this if you donāt know what youāre doing since
Unstable is synced several times a day with Arch package releases. Only a subset of Arch packages are modified to suit Manjaro. Those that use Unstable need to have the skills to get themselves out of trouble when they move their system to this branch.
(tl;dr things might break)
Nevertheless for posterity, you can follow this link, to see how switching branches works, and you can load any new drivers by either just rebooting the system, or (doing it properly) by killing X/Wayland, unloading the Nvidia drivers with rmmod and then load them back in with modprobe.
This doesnāt identify the problem of course but itās at least a workaround and it functions well for my purposes.