Dual GPU not working - Nvidia

Preface: I am new-ish to linux as my daily driver OS but not to troubleshooting. I have narrowed it down to something wrong in either the way ARCH, the kernel, (grub too?) handle the proprietary nvidia drivers, or the proprietary nvidia drivers / their configuration.

MY SETUP:

PCIE-01 x16 - nvidia 1070ti - 01:00 1 monitor attached

  • This card will be passed to a guest windows VM

PCIE-03 x4 - nvidia 1050ti - 04:00 2 monitor attached

  • This card will be dual monitor for host OS

OS - Manjaro KDE 21.1 - FRESH INSTALL as of this post
Kernel - 5.13 kernel
Driver - nvidia 470.63.01

Previous version was kernel 5.10 and kde 21.07. Previous install had a windows vm with the 1070ti passed through and intel integrated for the host OS. Worked but only 1 monitor sucks for host OS.

WHEN ITS WORKING:
Both GPUs work using Windows 10 LTSC as a host OS
Both GPUs work using at least the following distros - ubuntu, mint, zorin OS as host OS.
Both GPUs work on manjaro, endeavor but only when using the nvidia open source drivers (causes screen tearing and other issues but does work on all gpu and monitors).
Removing the 1070 from the pc will allow the 1050 to work as normal
Switching the 1050 to any pcie slot by itself (no 1070 installed) works as normal

I cannot swap the 1050ti to pcie01 or 02 and put the 1070 in pcie03 because the slot is only 4x and would kill the cards performance. 4x does not bottleneck the 1050 (its just for video acceleration across multi monitors anyway)

I cannot swap the 1050ti to pcie02 because the 1050 and 1070 then share an IOMMU group and I don’t want to run a custom patched kernel to isolate them (causes a new set of issues doing so).

WHEN ITS NOT WORKING
Switching to proprietary drivers causes the 1050ti to no longer allow displays. Both cards are detected.
If you enable a display (any make model) on the 1050, the resolution either:
A) gets added onto the 1070 monitor as if it were an ultra widescreen. You have to use the mouse to scroll the display to see the entire rendered resolution.
B) black screen on all monitors but you can see the cursor on the 1070 display, 1050 monitor goes to sleep.
C) black screen and all monitors go into sleep mode / no source detected.

Changing DVI / HDMI ports do nothing, nor changing cables. Switching monitors around does nothing.

This feels like an xserver, xorg or nvidia configuration issue and I am too new to desktop linux to have the fundamental knowledge to isolate further. I have already spent days and days reading about dual gpu nvidia issues.

I have tried editing xorg.conf and nvidia.conf but changes are ignored or dont apply? Is the load order for display drivers and kernel wrong?

WHAT HELP I WOULD LIKE, IF POSSIBLE
I feel like if someone more knowledgeable than me were to explain how to force grub / the kernel / arch / whoever to use the 1050ti and ignore the 1070, it may fix the issue. The 1070 will be passed to a VM so it needs to be enabled and available to the hypervisor.

I can even give remote access and stream the setup via webcam for anyone wanting to help, I have no personal data on this machine so nothing risky for me and know enough to notice if you tried to setup some sort of ssh or chroot tomfoolery during the remote session. I’d just wipe and recreate your steps to be safe anyway.

I think a way to ignore the 1070 by the host but be usable to the VM is to isolate the GPU,there is a article in the arch wiki about this

https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Isolating_the_GPU

There is even a video showing the tutorial all the way through,the minute 14:35 is where he begins the hijack process.

So you hijack the 1070,the host cannot use the card anymore,but the VM can still see it,that way you passthrough it,maybe this can work for you but I can’t say it for sure since I don’t isolate mine.

You can also ask in r/VFIO where there are more experienced people there in case this doesn’t work.

Thats the vid I used to do this with my igpu and 1070 on the prev install. It doesn’t cover why any arch based distro using the proprietary nvidia driver does not allow two pcie based gpu to run together from a default installation.

If I isolate the 1070, the 1050 doesn’t get used, I just boot to black as the xorg / display is attempting to still utilize the 1070 as the primary gpu.

I have fixed the issue. I am not 100% sure the steps i took but will try to elaborate for anyone else with this issue.

First things first, Nvidia sucks for linux.

Step 1 I opened nvidia.conf and opened nvidia-x-server-config

Step 2 Manually fill in all details of nvidia.conf using the information the config program provides. This includes screen info, and card info. Use my quoted nvidia.conf file to assist you. Treat your config file like you are programming and are defining each piece of hardware like a data type.

Step 3 save and restart your system to let xserver load your updated config. NOT EVERYTHING MAY WORK IMMEDIATELY

Step 4 go into Manajaro settings and enable / modify display settings from there. DO NOT try to use Nvidia configure tool, it will fail.

2 Likes

nvidia-settings: X configuration file generated by nvidia-settings

nvidia-settings: version 470.63.01

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 470.63.01

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0” 0 1080
Screen 1 “Screen1” Above “Screen0”
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
Option “Xinerama” “0”
EndSection

Section “Files”
EndSection

Section “InputDevice”

# generated from default
Identifier     "Mouse0"
Driver         "mouse"
Option         "Protocol" "auto"
Option         "Device" "/dev/psaux"
Option         "Emulate3Buttons" "no"
Option         "ZAxisMapping" "4 5"

EndSection

Section “InputDevice”

# generated from default
Identifier     "Keyboard0"
Driver         "kbd"

EndSection

Section “InputClass”
Identifier “Keyboard Defaults”
MatchIsKeyboard “yes”
Option “XkbOptions” “terminate:ctrl_alt_bksp”
EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Ancor Communications Inc ASUS VH236H”
HorizSync 30.0 - 85.0
VertRefresh 55.0 - 75.0
Option “DPMS”
EndSection

Section “Monitor”
Identifier “Monitor1”
VendorName “Unknown”
ModelName “DELL S2415H”
HorizSync 30.0 - 83.0
VertRefresh 56.0 - 76.0
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “NVIDIA GeForce GTX 1050 Ti”
Option “NoLogo” “1”
BusID “PCI:4:0:0”
Screen 0
EndSection

Section “Device”
Identifier “Device1”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “NVIDIA GeForce GTX 1050 Ti”
BusID “PCI:4:0:0”
Screen 1
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “Stereo” “0”
Option “nvidiaXineramaInfoOrder” “DFP-0”
Option “metamodes” “DVI-D-0: 1920x1080_60 +0+0”
Option “SLI” “Off”
Option “MultiGPU” “Off”
Option “BaseMosaic” “off”
SubSection “Display”
Depth 24
EndSubSection
EndSection

Section “Screen”
Identifier “Screen1”
Device “Device1”
Monitor “Monitor1”
DefaultDepth 24
Option “Stereo” “0”
Option “metamodes” “HDMI-0: 1920x1080_60 +0+0 {AllowGSYNC=Off}”
Option “SLI” “Off”
Option “MultiGPU” “Off”
Option “BaseMosaic” “off”
SubSection “Display”
Depth 24
EndSubSection
EndSection

Section “Extensions”
Option “COMPOSITE” “Enable”
EndSection

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.