Hi, switching from Ubuntu to Manjaro was a challenge regarding Docker with Nvidia support. There is a lot of information on the www, but I had to read several posts on forums as well as websites to cover them all.
Just in case you are looking for the same information and struggling with docker+nvidia on Manjaro, here are my steps that worked for me. I’m using Manjaro with Gnome, freshly installed on 12/26-21, which is the new version of Gnome.
Anyway, here are my steps to get it working, thanks to Nathan Labadie and Manish Kumar and others who had already gathered some useful commands that gave me a helping hand. Here’s the collection of steps which worked for ME; so no warranty this will work in every case
Make sure, your NVIDIA driver is working before you start into the Docker installation:
nvidia-smi
In case you get an error, please double check the driver and install it. In my case
sudo mhwd -a pci nonfree 0300
has done the job.
1.0 Installation of paru to use the AUR repository:
sudo pacman -S --needed git base-devel
git clone https://aur.archlinux.org/paru.git
cd paru
makepkg -si
2.0 Installation of Docker and Docker-Compose
paru -S docker
sudo systemctl enable docker
sudo systemctl start docker
paru -S docker-compose
2.1 Run Docker without sudo
sudo usermod -aG docker $USER
newgrp docker
3.0 Install nvidia-container-runtime and others
paru -S nvidia nvidia-utils nvidia-container-toolkit nvidia-container-runtime
Choose the default (1) = “nvidia-container-runtime” and not the “nvidia-container-runtime bin” version. Only in case you’ll get an error with the default, change it to use the “bin” version.
If the system asks for comparing further, just say “y” and you will see a “:” Type “q” to exit and the installation should continue and asks for further steps. Just say "y"es and it should proceed.
Check
which nvidia-container-runtime
this should give an expected and valid output.
4.0 Docker and NVIDIA should love each other after:
sudo nano /etc/modules-load.d/custom.conf
and add:
nvidia
nvidia-modeset
nvidia-drm
nvidia-uvm
aufs
overlay
macvlan
to it; Save and Exit.
5.0 Configure the NVIDIA runtime for docker:
sudo nano /etc/docker/daemon.json
and
# Copy and Paste the following lines to the file
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Save and Exit.
It is very likely you’ll will get a “NVML error” in case you’re trying to execute e.g. nvidia-smi within a container.
Here’s the fix:
6.0 Add the parameter systemd.unified_cgroup_hierarchy=false to the end at GRUB_CMDLINE_LINUX_DEFAULT= …
sudo nano /etc/default/grub
Below to copy & paste:
systemd.unified_cgroup_hierarchy=false
So it should look something like this (at the end):
GRUB_CMDLINE_LINUX_DEFAULT= ...udev.log_priority=3 systemd.unified_cgroup_hierarchy=false"
Make it active:
sudo update-grub
7.0 Edit config.toml
sudo nano /etc/nvidia-container-runtime/config.toml
and change “no-cgroups = true” to “no-cgroups = false”. In case the paramter is missing, add it to the file. But it should be there, please read carefully.
[nvidia-container-cli]
no-cgroups = false
8.0 Reboot
sudo reboot
After reboot, you can check the parameters with:
sudo cat /proc/cmdline
They should appear here.
9.0 Double check everything; let’s start with a DIY container:
nano ~/docker-compose.yml
and add:
# Copy and Paste the following lines to the file
version: '2.3'
services:
nvidia-smi-test:
runtime: nvidia
image: nvidia/cuda:9.2-runtime-centos7
# End file
save and exit.
Test it with:
docker-compose up -d
docker-compose run nvidia-smi-test
The last command starts the created container and you are then inside the container.
The prompt looks something like the following line, type “nvidia-smi”; you should then see the output. (hopefully
[root@a31dcc1d0af3 /] nvidia-smi
Type “exit” to leave the container.
9.1 Test with other examples:
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
9.2 Example using privileged mode:
sudo docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0-base nvidia-smi
9.3 or start TensorFlow with GPU support:
docker run -it --rm --gpus all tensorflow/tensorflow:latest-gpu bash
Within this container, also execute “nvidia-smi”. Should work; “exit” to leave the container.
Hope I haven’t forgot something; please let me know in case something is missing.