Need help with Nvidia egpu for CUDA

Hi all, My first post here. I’m having an issue getting an external Nvidia (RTX 2070) egpu to work with latest stable Manjaro KDE (5.9 kernel) on my Dell Precision laptop (w/ Mesa Intel UHD Graphics 630 for display).

  1. My intention is to be able to install Nvidia driver, Nvidia CUDA SDK, and utilize the egpu (via thunderbolt) for deep learning model training. The Intel gpu on-board will handle display. The external Nvidia card will not be utilized for display at all!

  2. Here’s what I did so far:
    a. Using system settings I can see external Nvidia card listed (correctly identified model RTX 2070).
    b. So, I installed Nvidia latest proprietary driver 455.xx, it installed successfully.
    c. Running nvidia-settings from terminal throws an error, here:

$ sudo nvidia-settings
ERROR: Unable to load info from any available system

Here is the output of

$ inxi -Fza
System: Kernel: 5.9.1-1-MANJARO x86_64 bits: 64 compiler: N/A
parameters: BOOT_IMAGE=/boot/vmlinuz-5.9-x86_64 root=UUID=c511d03a-584a-4bcd-882e-e025906bacd0 rw quiet apparmor=1
security=apparmor udev.log_priority=3
Desktop: KDE Plasma 5.19.5 tk: Qt 5.15.1 wm: kwin_x11 dm: SDDM Distro: Manjaro Linux
Machine: Type: Laptop System: Dell product: Precision 7540 v: N/A serial: Chassis: type: 10 serial:
Mobo: Dell model: 0XMC3F v: A00 serial: UEFI: Dell v: 1.1.3 date: 06/21/2019
Battery: ID-1: BAT0 charge: 76.8 Wh condition: 76.8/97.0 Wh (79%) volts: 12.6/11.4 model: SMP DELL VRX0J8B type: Li-poly
serial: status: Full
CPU: Topology: 6-Core model: Intel Core i7-9750H bits: 64 type: MT MCP arch: Kaby Lake family: 6 model-id: 9E (158)
stepping: A (10) microcode: D6 L2 cache: 12.0 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 62431
Speed: 800 MHz min/max: 800/4500 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800
9: 800 10: 801 11: 800 12: 800
Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled
Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
Type: mds mitigation: Clear CPU buffers; SMT vulnerable
Type: meltdown mitigation: PTI
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
Type: srbds mitigation: Microcode
Type: tsx_async_abort status: Not affected
Graphics: Device-1: Intel UHD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:3e9b
Device-2: NVIDIA TU106 [GeForce RTX 2070] vendor: Gigabyte driver: nvidia v: 455.28 alternate: nouveau,nvidia_drm
bus ID: 07:00.0 chip ID: 10de:1f02
Device-3: Sunplus Innovation Integrated_Webcam_HD type: USB driver: uvcvideo bus ID: 1-11:3 chip ID: 1bcf:28c4
serial:
Display: x11 server:X.Org 1.20.9 compositor: kwin_x11 driver: modesetting FAILED: nvidia unloaded: intel,nouveau
alternate: fbdev,nv,vesa display ID: :0 screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.0x11.2") s-diag: 582mm (22.9")
Monitor-1: eDP-1 res: 1920x1080 hz: 60 dpi: 142 size: 344x194mm (13.5x7.6") diag: 395mm (15.5")
OpenGL: renderer: Mesa Intel UHD Graphics 630 (CFL GT2) v: 4.6 Mesa 20.1.8 direct render: Yes
Audio: Device-1: Intel Cannon Lake PCH cAVS vendor: Dell driver: snd_hda_intel v: kernel
alternate: snd_soc_skl,snd_sof_pci bus ID: 00:1f.3 chip ID: 8086:a348
Sound Server: ALSA v: k5.9.1-1-MANJARO
Network: Device-1: Intel Ethernet I219-LM vendor: Dell driver: e1000e v: kernel port: efa0 bus ID: 00:1f.6
chip ID: 8086:15bb
IF: eno1 state: down mac:
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel port: 4000 bus ID: 6e:00.0 chip ID: 8086:2723
IF: wlp110s0 state: up mac:
Drives: Local Storage: total: 476.94 GiB used: 66.14 GiB (13.9%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 model: SSDPEMKF512G8 NVMe INTEL 512GB size: 476.94 GiB block size: physical: 512 B
logical: 512 B speed: 31.6 Gb/s lanes: 4 serial: rev: 7002 scheme: GPT
Partition: ID-1: / raw size: 339.34 GiB size: 333.01 GiB (98.14%) used: 66.09 GiB (19.8%) fs: ext4 dev: /dev/nvme0n1p5
Swap: Alert: No Swap data was found.
Sensors: System Temperatures: cpu: 54.0 C mobo: N/A
Fan Speeds (RPM): cpu: 0 fan-2: 0
Info: Processes: 311 Uptime: 22m Memory: 31.12 GiB used: 2.06 GiB (6.6%) Init: systemd v: 246 Compilers: gcc: 10.2.0
Packages: pacman: 1323 lib: 348 flatpak: 0 Shell: Bash v: 5.0.18 running in: konsole inxi: 3.1.05

Please advise on what I need to do. Thanks in advance.

nvidia, especially things like cuda, does not work with kernel 5.9

It seems you have a dual-gpu, and are probably using PRIME.
Nothing will use the nvidia card without using prime-run.
Also… dont use sudo on GUI applications.

1 Like

Thanks for the responses. I downgraded to kernel 5.4 seeing that it is supported (in theory) for Ubuntu 20.04. I’m reading thru Nvidia’s CUDA installation guide for Linux (can’t post link).

I’ll post back here to detail what any problems/successes.

I’m not sure what kind of guide you’re reading, but sudo pacman -Syu cuda should be sufficient in most cases. Most of the time it makes more sense to read the distribution specific “guide” about a thing.