Docker with nvidia GPU-2

yoda_1 · 16 May 2024 07:48

I have to use docker for my job, but I canot find the good instaruction page to run docker with GPU on Manjaro or Arch, comparng to Ubuntu.
I do not like Ubuntu at all.

Here is my report to install nvidia-container-toolkit to run nvidia GPU on docker.
I hope Arch creates new pkg.

I found the post, or Arch page, but I can not find the nvidia-container-toolkit now on Arch-repo or AUR .
Something has changed on Arch-repo.

I follows ref-1 page instaructions & solve nvidia-container-cli errors by ref-2.

Reference.
ref-1
ref-2

preparation

We can not use compiled pkgs and have to build manually in this moment.
I copied some contents from ref-1 and modify for my codition.

1. Install libnvidia-container-tools at first.

Because you may have message, if installing nvidia-container-toolkit at first.

“Missing dependencies: → libnvidia-container-tools>=1.9.0 AUR”.

wget https://aur.archlinux.org/cgit/aur.git/snapshot/libnvidia-container.tar.gz
tar xvf libnvidia-container.tar.gz && cd libnvidia-container/
makepkg

sudo pacman -U libnvidia-container-1.11.0-1-x86_64.pkg.tar.zst
sudo pacman -U libnvidia-container-tools-1.11.0-1-x86_64.pkg.tar.zst

2. Install nvidia-container-toolkit .

wget https://aur.archlinux.org/cgit/aur.git/snapshot/nvidia-container-toolkit.tar.gz
tar xvf nvidia-container-toolkit.tar.gz && cd nvidia-container-toolkit/
makepkg

yay -U nvidia-container-toolkit-1.14.6-1-x86_64.pkg.tar.zst

3. confirm following file.

set false.

vi /etc/nvidia-container-runtime/config.toml
##
no-cgroups = false

4. restart docker.

sudo systemctl restart docker

5. Run dokcer but have error.

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
##
nvidia-container-cli: initialization error: nvml error: insufficient permissions: unknown.

6. modify the config.toml.

sudo vi /etc/nvidia-container-runtime/config.toml

## add or modify as follows
user = "root:vglusers"
## or
user = "root:root"

7. restart docker.

sudo systemctl restart docker

8. run docker with GPU.

it works !

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

Thu May 16 06:28:13 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+====================|
|   0  NVIDIA GeForce GTX 1080        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   40C    P8             10W /  240W |      13MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|==============================================================

My contion is as follows.

use a little bit old driver & old GPU.

yay -Qs cuda

local/cuda 11.8.0-1
    NVIDIA's GPU programming toolkit

local/cudnn 8.6.0.163-1
    NVIDIA CUDA Deep Neural Network library
 
yay -Qs nvidia-container-toolkit

local/nvidia-container-toolkit 1.14.6-1
    NVIDIA container runtime toolkit

inxi -F

System:
  Host: ***  Kernel: 5.15.150-1-MANJARO arch: x86_64 bits: 64
  Desktop: IceWM v: N/A Distro: Manjaro Linux

CPU:
  Info: quad core model: Intel Core i7-4790K bits: 64 type: MT MCP cache:
    L2: 1024 KiB

Graphics:
  Device-1: NVIDIA GP104 [GeForce GTX 1080] driver: nvidia v: 550.54.14
  Display: server: X.org v: 1.21.1.11 driver: X: loaded: nvidia gpu: nvidia
    resolution: 3840x1080
  API: EGL v: 1.5 drivers: kms_swrast,nvidia,swrast,zink
    platforms: gbm,x11,surfaceless,device
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: mesa v: 24.0.2-manjaro1.1
    renderer: llvmpipe (LLVM 16.0.6 256 bits)

Reference.

ref-1:

ref-2:

github.com/NVIDIA/nvidia-docker

when I start a container get `insufficient permissions`

opened 11:55AM - 16 Sep 21 UTC

closed 10:53AM - 22 Sep 21 UTC

zacario-li

_The template below is mostly useful for bug reports and support questions. Feel… free to remove anything which doesn't apply to you and add more information where it makes sense._ _Also, before reporting a new issue, please make sure that:_ - _You read carefully the [documentation and frequently asked questions](https://github.com/NVIDIA/nvidia-docker/wiki)._ - _You [searched](https://github.com/NVIDIA/nvidia-docker/issues?utf8=%E2%9C%93&q=is%3Aissue) for a similar issue and this is not a duplicate of an existing one._ - _This issue is not related to [NGC](https://github.com/NVIDIA/nvidia-docker/wiki/NGC), otherwise, please use the [devtalk forums](https://devtalk.nvidia.com/default/board/200/nvidia-gpu-cloud-ngc-users/) instead._ - _You went through the [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting) steps._ --- ### 1. Issue or feature description I try to start my container, get these errors: ``` docker start lzj_dev_1.2 Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: insufficient permissions: unknown Error: failed to start containers: lzj_dev_1.2 ``` ### 2. Steps to reproduce the issue ``` docker start lzj_dev_1.2 ``` ### 3. Information to [attach](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/) (optional if deemed irrelevant) - [x] Some nvidia-container information: `nvidia-container-cli -k -d /dev/tty info` ``` -- WARNING, the following logs are for debugging purposes only -- I0916 11:41:46.155590 253461 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c3736 5c741718c2df) I0916 11:41:46.155661 253461 nvc.c:346] using root / I0916 11:41:46.155675 253461 nvc.c:347] using ldcache /etc/ld.so.cache I0916 11:41:46.155687 253461 nvc.c:348] using unprivileged user 1000:1000 I0916 11:41:46.155719 253461 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0916 11:41:46.156034 253461 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment W0916 11:41:46.163678 253462 nvc.c:269] failed to set inheritable capabilities W0916 11:41:46.163737 253462 nvc.c:270] skipping kernel modules load due to failure I0916 11:41:46.164059 253463 driver.c:101] starting driver service I0916 11:41:46.169038 253461 nvc_info.c:676] requesting driver information with '' I0916 11:41:46.171256 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.470.63.01 I0916 11:41:46.171911 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.470.63.01 I0916 11:41:46.172018 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.470.63.01 I0916 11:41:46.172079 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.63.01 I0916 11:41:46.172140 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.470.63.01 I0916 11:41:46.172222 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.470.63.01 I0916 11:41:46.172312 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.470.63.01 I0916 11:41:46.172370 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.470.63.01 I0916 11:41:46.172427 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.63.01 I0916 11:41:46.172509 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.470.63.01 I0916 11:41:46.172599 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.470.63.01 I0916 11:41:46.172656 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.470.63.01 I0916 11:41:46.172714 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01 I0916 11:41:46.172772 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.470.63.01 I0916 11:41:46.172857 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.470.63.01 I0916 11:41:46.172937 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.470.63.01 I0916 11:41:46.172997 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.63.01 I0916 11:41:46.173055 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.63.01 I0916 11:41:46.173140 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.470.63.01 I0916 11:41:46.173198 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.470.63.01 I0916 11:41:46.173279 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.470.63.01 I0916 11:41:46.174006 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01 I0916 11:41:46.174372 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.470.63.01 I0916 11:41:46.174436 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.470.63.01 I0916 11:41:46.174498 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.470.63.01 I0916 11:41:46.174564 253461 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.470.63.01 I0916 11:41:46.174638 253461 nvc_info.c:169] selecting /usr/lib32/vdpau/libvdpau_nvidia.so.470.63.01 I0916 11:41:46.174719 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-tls.so.470.63.01 I0916 11:41:46.174775 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-ptxjitcompiler.so.470.63.01 I0916 11:41:46.174849 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-opticalflow.so.470.63.01 I0916 11:41:46.174926 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-opencl.so.470.63.01 I0916 11:41:46.174981 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-ml.so.470.63.01 I0916 11:41:46.175055 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-ifr.so.470.63.01 I0916 11:41:46.175130 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-glvkspirv.so.470.63.01 I0916 11:41:46.175182 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-glsi.so.470.63.01 I0916 11:41:46.175239 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-glcore.so.470.63.01 I0916 11:41:46.175304 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-fbc.so.470.63.01 I0916 11:41:46.175378 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-encode.so.470.63.01 I0916 11:41:46.175454 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-eglcore.so.470.63.01 I0916 11:41:46.175509 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-compiler.so.470.63.01 I0916 11:41:46.175569 253461 nvc_info.c:169] selecting /usr/lib32/libnvidia-allocator.so.470.63.01 I0916 11:41:46.175644 253461 nvc_info.c:169] selecting /usr/lib32/libnvcuvid.so.470.63.01 I0916 11:41:46.175732 253461 nvc_info.c:169] selecting /usr/lib32/libcuda.so.470.63.01 I0916 11:41:46.175812 253461 nvc_info.c:169] selecting /usr/lib32/libGLX_nvidia.so.470.63.01 I0916 11:41:46.175872 253461 nvc_info.c:169] selecting /usr/lib32/libGLESv2_nvidia.so.470.63.01 I0916 11:41:46.175925 253461 nvc_info.c:169] selecting /usr/lib32/libGLESv1_CM_nvidia.so.470.63.01 I0916 11:41:46.175979 253461 nvc_info.c:169] selecting /usr/lib32/libEGL_nvidia.so.470.63.01 W0916 11:41:46.176015 253461 nvc_info.c:350] missing library libnvidia-nscq.so W0916 11:41:46.176028 253461 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so W0916 11:41:46.176040 253461 nvc_info.c:354] missing compat32 library libnvidia-cfg.so W0916 11:41:46.176049 253461 nvc_info.c:354] missing compat32 library libnvidia-nscq.so W0916 11:41:46.176061 253461 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so W0916 11:41:46.176075 253461 nvc_info.c:354] missing compat32 library libnvidia-ngx.so W0916 11:41:46.176088 253461 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so W0916 11:41:46.176102 253461 nvc_info.c:354] missing compat32 library libnvoptix.so W0916 11:41:46.176117 253461 nvc_info.c:354] missing compat32 library libnvidia-cbl.so I0916 11:41:46.176944 253461 nvc_info.c:276] selecting /usr/bin/nvidia-smi I0916 11:41:46.176975 253461 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump I0916 11:41:46.177008 253461 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced I0916 11:41:46.177061 253461 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control I0916 11:41:46.177093 253461 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server W0916 11:41:46.177199 253461 nvc_info.c:376] missing binary nv-fabricmanager I0916 11:41:46.177245 253461 nvc_info.c:438] listing device /dev/nvidiactl I0916 11:41:46.177258 253461 nvc_info.c:438] listing device /dev/nvidia-uvm I0916 11:41:46.177272 253461 nvc_info.c:438] listing device /dev/nvidia-uvm-tools I0916 11:41:46.177282 253461 nvc_info.c:438] listing device /dev/nvidia-modeset W0916 11:41:46.177319 253461 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket W0916 11:41:46.177356 253461 nvc_info.c:321] missing ipc /var/run/nvidia-fabricmanager/socket W0916 11:41:46.177383 253461 nvc_info.c:321] missing ipc /tmp/nvidia-mps I0916 11:41:46.177395 253461 nvc_info.c:733] requesting device information with '' I0916 11:41:46.183977 253461 nvc_info.c:623] listing device /dev/nvidia0 (GPU-09673de3-1f59-66ef-d677-6e00cd4fb69e at 0 0000000:3e:00.0) I0916 11:41:46.190465 253461 nvc_info.c:623] listing device /dev/nvidia1 (GPU-2642e9bf-ef3d-79d6-8684-dec46d0c3b6b at 0 0000000:41:00.0) I0916 11:41:46.197063 253461 nvc_info.c:623] listing device /dev/nvidia2 (GPU-cd326b3f-0c72-dea4-7475-34da67b6b3c8 at 0 0000000:45:00.0) I0916 11:41:46.203870 253461 nvc_info.c:623] listing device /dev/nvidia3 (GPU-14a23a1a-5e14-0e55-e4f0-b3f1ecb33802 at 0 0000000:46:00.0) NVRM version: 470.63.01 CUDA version: 11.4 Device Index: 0 Device Minor: 0 Model: NVIDIA GeForce RTX 3090 Brand: GeForce GPU UUID: GPU-09673de3-1f59-66ef-d677-6e00cd4fb69e Bus Location: 00000000:3e:00.0 Architecture: 8.6 Device Index: 1 Device Minor: 1 Model: NVIDIA GeForce RTX 3090 Brand: GeForce GPU UUID: GPU-2642e9bf-ef3d-79d6-8684-dec46d0c3b6b Bus Location: 00000000:41:00.0 Architecture: 8.6 Device Index: 2 Device Minor: 2 Model: NVIDIA GeForce RTX 3090 Brand: GeForce GPU UUID: GPU-cd326b3f-0c72-dea4-7475-34da67b6b3c8 Bus Location: 00000000:45:00.0 Architecture: 8.6 Device Index: 3 Device Minor: 3 Model: NVIDIA GeForce RTX 3090 Brand: GeForce GPU UUID: GPU-14a23a1a-5e14-0e55-e4f0-b3f1ecb33802 Bus Location: 00000000:46:00.0 Architecture: 8.6 I0916 11:41:46.204040 253461 nvc.c:423] shutting down library context I0916 11:41:46.206249 253463 driver.c:163] terminating driver service I0916 11:41:46.206891 253461 driver.c:203] driver service terminated successfully ``` - [x] Kernel version from `uname -a` ``` Linux CNMB1AP003P 5.8.0-50-generic #56-Ubuntu SMP Mon Apr 12 17:18:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ``` - [x] Driver information from `nvidia-smi -a` ``` ==============NVSMI LOG============== Timestamp : Thu Sep 16 19:47:26 2021 Driver Version : 470.63.01 CUDA Version : 11.4 Attached GPUs : 4 GPU 00000000:3E:00.0 Product Name : NVIDIA GeForce RTX 3090 Product Brand : GeForce Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-09673de3-1f59-66ef-d677-6e00cd4fb69e Minor Number : 0 VBIOS Version : 94.02.26.88.08 MultiGPU Board : No Board ID : 0x3e00 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.03.03 OEM Object : 2.0 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x3E Device : 0x00 Domain : 0x0000 Device Id : 0x220410DE Bus Id : 00000000:3E:00.0 Sub System Id : 0x00007377 GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 2000 KB/s Fan Speed : 33 % Performance State : P2 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 24268 MiB Used : 19151 MiB Free : 5117 MiB BAR1 Memory Usage Total : 256 MiB Used : 4 MiB Free : 252 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 47 C GPU Shutdown Temp : 98 C GPU Slowdown Temp : 95 C GPU Max Operating Temp : 93 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 174.40 W Power Limit : 350.00 W Default Power Limit : 350.00 W Enforced Power Limit : 350.00 W Min Power Limit : 100.00 W Max Power Limit : 350.00 W Clocks Graphics : 1965 MHz SM : 1965 MHz Memory : 9501 MHz Video : 1740 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 9751 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 1075.000 mV Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 101615 Type : C Name : python Used GPU Memory : 19149 MiB GPU 00000000:41:00.0 Product Name : NVIDIA GeForce RTX 3090 Product Brand : GeForce Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-2642e9bf-ef3d-79d6-8684-dec46d0c3b6b Minor Number : 1 VBIOS Version : 94.02.26.88.08 MultiGPU Board : No Board ID : 0x4100 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.03.03 OEM Object : 2.0 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x41 Device : 0x00 Domain : 0x0000 Device Id : 0x220410DE Bus Id : 00000000:41:00.0 Sub System Id : 0x00007377 GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 32 % Performance State : P2 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 24268 MiB Used : 17859 MiB Free : 6409 MiB BAR1 Memory Usage Total : 256 MiB Used : 4 MiB Free : 252 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 44 C GPU Shutdown Temp : 98 C GPU Slowdown Temp : 95 C GPU Max Operating Temp : 93 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 156.40 W Power Limit : 350.00 W Default Power Limit : 350.00 W Enforced Power Limit : 350.00 W Min Power Limit : 100.00 W Max Power Limit : 350.00 W Clocks Graphics : 1950 MHz SM : 1950 MHz Memory : 9501 MHz Video : 1710 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 9751 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 1081.250 mV Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 101615 Type : C Name : python Used GPU Memory : 17857 MiB GPU 00000000:45:00.0 Product Name : NVIDIA GeForce RTX 3090 Product Brand : GeForce Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-cd326b3f-0c72-dea4-7475-34da67b6b3c8 Minor Number : 2 VBIOS Version : 94.02.26.88.08 MultiGPU Board : No Board ID : 0x4500 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.03.03 OEM Object : 2.0 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x45 Device : 0x00 Domain : 0x0000 Device Id : 0x220410DE Bus Id : 00000000:45:00.0 Sub System Id : 0x00007377 GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : 32 % Performance State : P2 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 24268 MiB Used : 19147 MiB Free : 5121 MiB BAR1 Memory Usage Total : 256 MiB Used : 4 MiB Free : 252 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 46 C GPU Shutdown Temp : 98 C GPU Slowdown Temp : 95 C GPU Max Operating Temp : 93 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 201.95 W Power Limit : 350.00 W Default Power Limit : 350.00 W Enforced Power Limit : 350.00 W Min Power Limit : 100.00 W Max Power Limit : 350.00 W Clocks Graphics : 1950 MHz SM : 1950 MHz Memory : 9501 MHz Video : 1710 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 9751 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 1081.250 mV Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 120483 Type : C Name : python Used GPU Memory : 19145 MiB GPU 00000000:46:00.0 Product Name : NVIDIA GeForce RTX 3090 Product Brand : GeForce Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-14a23a1a-5e14-0e55-e4f0-b3f1ecb33802 Minor Number : 3 VBIOS Version : 94.02.26.88.08 MultiGPU Board : No Board ID : 0x4600 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.03.03 OEM Object : 2.0 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x46 Device : 0x00 Domain : 0x0000 Device Id : 0x220410DE Bus Id : 00000000:46:00.0 Sub System Id : 0x00007377 GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 4016000 KB/s Fan Speed : 32 % Performance State : P2 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 24268 MiB Used : 17859 MiB Free : 6409 MiB BAR1 Memory Usage Total : 256 MiB Used : 4 MiB Free : 252 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 46 C GPU Shutdown Temp : 98 C GPU Slowdown Temp : 95 C GPU Max Operating Temp : 93 C GPU Target Temperature : 83 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 169.01 W Power Limit : 350.00 W Default Power Limit : 350.00 W Enforced Power Limit : 350.00 W Min Power Limit : 100.00 W Max Power Limit : 350.00 W Clocks Graphics : 1935 MHz SM : 1935 MHz Memory : 9501 MHz Video : 1695 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 9751 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 1081.250 mV Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 120483 Type : C Name : python Used GPU Memory : 17857 MiB ``` - [x] Docker version from `docker version` ``` Client: Docker Engine - Community Version: 20.10.8 API version: 1.41 Go version: go1.16.6 Git commit: 3967b7d Built: Fri Jul 30 19:54:09 2021 OS/Arch: linux/amd64 Context: default Experimental: true Server: Docker Engine - Community Engine: Version: 20.10.8 API version: 1.41 (minimum version 1.12) Go version: go1.16.6 Git commit: 75249d8 Built: Fri Jul 30 19:52:16 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.9 GitCommit: e25210fe30a0a703442421b0f60afac609f950a3 nvidia: Version: 1.0.1 GitCommit: v1.0.1-0-g4144b63 docker-init: Version: 0.19.0 GitCommit: de40ad0 ``` - [x] NVIDIA packages version from `dpkg -l '*nvidia*'` _or_ `rpm -qa '*nvidia*'` ``` Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================-==========================-============-=======================================> un libgldispatch0-nvidia <none> <none> (no description available) un libnvidia-compute <none> <none> (no description available) ii libnvidia-compute-460-server:amd64 460.73.01-0ubuntu0.20.10.1 amd64 NVIDIA libcompute package ii libnvidia-container-tools 1.4.0-1 amd64 NVIDIA container runtime library (comma> ii libnvidia-container1:amd64 1.4.0-1 amd64 NVIDIA container runtime library un libnvidia-ml.so.1 <none> <none> (no description available) un libnvidia-ml1 <none> <none> (no description available) un nvidia-common <none> <none> (no description available) ii nvidia-container-runtime 3.5.0-1 amd64 NVIDIA container runtime un nvidia-container-runtime-hook <none> <none> (no description available) ii nvidia-container-toolkit 1.5.1-1 amd64 NVIDIA container runtime hook un nvidia-docker <none> <none> (no description available) ii nvidia-docker2 2.6.0-1 all nvidia-docker CLI wrapper un nvidia-libopencl1-dev <none> <none> (no description available) un nvidia-opencl-icd <none> <none> (no description available) un nvidia-prime <none> <none> (no description available) ``` - [x] NVIDIA container library version from `nvidia-container-cli -V` ``` version: 1.4.0 build date: 2021-04-24T14:25+00:00 build revision: 704a698b7a0ceec07a48e56c37365c741718c2df build compiler: x86_64-linux-gnu-gcc-7 7.5.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections ``` - [x] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting)) ``` -- WARNING, the following logs are for debugging purposes only -- I0916 11:51:29.956723 273346 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df) I0916 11:51:29.956808 273346 nvc.c:346] using root / I0916 11:51:29.956822 273346 nvc.c:347] using ldcache /etc/ld.so.cache I0916 11:51:29.956835 273346 nvc.c:348] using unprivileged user 65534:65534 I0916 11:51:29.956866 273346 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0916 11:51:29.957187 273346 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment I0916 11:51:29.964928 273351 nvc.c:274] loading kernel module nvidia I0916 11:51:29.965097 273351 nvc.c:278] running mknod for /dev/nvidiactl I0916 11:51:29.965158 273351 nvc.c:282] running mknod for /dev/nvidia0 I0916 11:51:29.965200 273351 nvc.c:282] running mknod for /dev/nvidia1 I0916 11:51:29.965237 273351 nvc.c:282] running mknod for /dev/nvidia2 I0916 11:51:29.965274 273351 nvc.c:282] running mknod for /dev/nvidia3 I0916 11:51:29.965311 273351 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps I0916 11:51:29.977047 273351 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config I0916 11:51:29.977202 273351 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor I0916 11:51:29.981600 273351 nvc.c:292] loading kernel module nvidia_uvm I0916 11:51:29.981639 273351 nvc.c:296] running mknod for /dev/nvidia-uvm I0916 11:51:29.981731 273351 nvc.c:301] loading kernel module nvidia_modeset I0916 11:51:29.981764 273351 nvc.c:305] running mknod for /dev/nvidia-modeset I0916 11:51:29.982105 273352 driver.c:101] starting driver service I0916 11:51:29.985185 273346 driver.c:203] driver service terminated with signal 15 ```