[HowTo] Installing Docker and NVIDIA runtime - My experience and HowTo

Hi, switching from Ubuntu to Manjaro was a challenge regarding Docker with Nvidia support. There is a lot of information on the www, but I had to read several posts on forums as well as websites to cover them all.

Just in case you are looking for the same information and struggling with docker+nvidia on Manjaro, here are my steps that worked for me. I’m using Manjaro with Gnome, freshly installed on 12/26-21, which is the new version of Gnome.

Anyway, here are my steps to get it working, thanks to Nathan Labadie and Manish Kumar and others who had already gathered some useful commands that gave me a helping hand. Here’s the collection of steps which worked for ME; so no warranty this will work in every case :wink:

Make sure, your NVIDIA driver is working before you start into the Docker installation:

nvidia-smi

In case you get an error, please double check the driver and install it. In my case

sudo mhwd -a pci nonfree 0300

has done the job.

1.0 Installation of paru to use the AUR repository:

sudo pacman -S --needed git base-devel
git clone https://aur.archlinux.org/paru.git
cd paru
makepkg -si

2.0 Installation of Docker and Docker-Compose

paru -S docker
sudo systemctl enable docker
sudo systemctl start docker
paru -S docker-compose

2.1 Run Docker without sudo

sudo usermod -aG docker $USER
newgrp docker

3.0 Install nvidia-container-runtime and others

paru -S nvidia nvidia-utils nvidia-container-toolkit nvidia-container-runtime

Choose the default (1) = “nvidia-container-runtime” and not the “nvidia-container-runtime bin” version. Only in case you’ll get an error with the default, change it to use the “bin” version.
If the system asks for comparing further, just say “y” and you will see a “:” Type “q” to exit and the installation should continue and asks for further steps. Just say "y"es and it should proceed.

Check

which nvidia-container-runtime

this should give an expected and valid output.

4.0 Docker and NVIDIA should love each other after:

sudo nano /etc/modules-load.d/custom.conf

and add:

nvidia
nvidia-modeset
nvidia-drm
nvidia-uvm
aufs
overlay
macvlan

to it; Save and Exit.

5.0 Configure the NVIDIA runtime for docker:

sudo nano /etc/docker/daemon.json

and

# Copy and Paste the following lines to the file
   {
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Save and Exit.

It is very likely you’ll will get a “NVML error” in case you’re trying to execute e.g. nvidia-smi within a container.

Here’s the fix:

6.0 Add the parameter systemd.unified_cgroup_hierarchy=false to the end at GRUB_CMDLINE_LINUX_DEFAULT= …

sudo nano /etc/default/grub

Below to copy & paste:

systemd.unified_cgroup_hierarchy=false

So it should look something like this (at the end):

GRUB_CMDLINE_LINUX_DEFAULT= ...udev.log_priority=3 systemd.unified_cgroup_hierarchy=false"

Make it active:

sudo update-grub

7.0 Edit config.toml

sudo nano /etc/nvidia-container-runtime/config.toml

and change “no-cgroups = true” to “no-cgroups = false”. In case the paramter is missing, add it to the file. But it should be there, please read carefully.

[nvidia-container-cli]
no-cgroups = false

8.0 Reboot

sudo reboot

After reboot, you can check the parameters with:

sudo cat /proc/cmdline

They should appear here.

9.0 Double check everything; let’s start with a DIY container:

nano ~/docker-compose.yml

and add:

# Copy and Paste the following lines to the file
version: '2.3'
services:
nvidia-smi-test:
runtime: nvidia
image: nvidia/cuda:9.2-runtime-centos7
# End file

save and exit.

Test it with:

docker-compose up -d   
docker-compose run nvidia-smi-test 

The last command starts the created container and you are then inside the container.
The prompt looks something like the following line, type “nvidia-smi”; you should then see the output. (hopefully :slight_smile:

 [root@a31dcc1d0af3 /]  nvidia-smi

Type “exit” to leave the container.


9.1 Test with other examples:

sudo docker run --rm  --gpus all nvidia/cuda:11.0-base nvidia-smi

9.2 Example using privileged mode:

sudo docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0-base nvidia-smi

9.3 or start TensorFlow with GPU support:

docker run -it --rm --gpus all tensorflow/tensorflow:latest-gpu bash   

Within this container, also execute “nvidia-smi”. Should work; “exit” to leave the container.

Hope I haven’t forgot something; please let me know in case something is missing.

3 Likes

why paru ? why not pamac ?
is pamac do not have arm support ?
if does not have please add it.

1 Like

Haven’t tried it but it might work. Have seen it was used in another post. However, paru has done the job for me

Important: Please make sure not to insert the comment line (#…) into the /etc/docker/daemon.json file, it should look like this; including spaces:

   {
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Same for docker-compose.yml:

version: '2.3'
services:
  nvidia-smi-test:
    runtime: nvidia
    image: nvidia/cuda:9.2-runtime-centos7

Otherwise you will get a “services must be a mapping” error!

Just in case the docker service will not start, try it with:

sudo /usr/bin/dockerd

to get more information and find the issue.

1 Like