Cannot login with iGPU, when cable is attached on dGPU

About a year ago, I made this post, trying to figure out how to isolate my dGPU for pass-through to a VM. I never managed to make it work, and I was using a workaround, removing the cable until I need it.

Originally I thought the problem was that both my iGPU & dGPU were AMD, so it couldn’t load the driver first and last at the same time. Therefore, I decided to replace my 6900XT with an Nvidia RTX 4080. Unfortunately, this didn’t solve the issue either.

Later on, I moved my main screen to the right side and got as main an Alienware, where to my surprise was behaving better, I didn’t have to remove the cable, but just to switch on the monitor, after the boot selection screen. Much more convenient, but it didn’t last.

That monitor kept failing, so I had to refund it and replace it with an Asus, which sent me back to square 1, where I have to remove the cable again. And that made me mad, so I decided to solve this issue once and for all.

I know that the problem is in the motherboard firmware, I filed an INC back then to Gigabyte, but they suggested it is a software issue, and they sent me a video with Windows 10, working fine. However, I am sure it is firmware related, since it happens when I enter in BIOS settings, the screen appears on my dGPU connected monitor.

I am not very positive, on getting a fix from their side, so here I am again.

Since Windows is working, there should be a way to make Linux work too. I have been trying multiple configurations since last week, where I got the new monitor, but no luck. I have re-installed Manjaro multiple times, just to clean things up, and keep trying whatever solutions I have found online, but none of them works.

My current setup:

GRUB_CMDLINE_LINUX_DEFAULT="resume=UUID=ff3443f9-1bb6-4670-81e5-2c14956e7a45 udev.log_priority=3 video=efifb:off amdgpu.dc=1 amdgpu.modeset=0 nouveau.modeset=1"

mkinitcpio.conf

MODULES=(amdgpu)
HOOKS=(base udev autodetect kms modconf block keyboard keymap consolefont plymouth resume filesystems fsck)

modprobe.d/vfio.conf

blacklist nouveau
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
softdep nvidia pre: vfio-pci
options vfio_pci ids=10de:2704,10de:22bb

This is the result, but only if I boot without the cable attached:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD103 [GeForce RTX 4080] [10de:2704] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:40bc]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia

If I attach the cable, it stucks before the login screen. Here are some interesting lines from my log, when I have the cable attached:

Sep 02 20:32:31.689115 wizzy-am5-manjaro-kde6 kernel: pci 0000:01:00.0: vgaarb: setting as boot VGA device
Sep 02 20:32:31.689177 wizzy-am5-manjaro-kde6 kernel: pci 0000:01:00.0: vgaarb: bridge control possible
Sep 02 20:32:31.689240 wizzy-am5-manjaro-kde6 kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
Sep 02 20:32:31.689306 wizzy-am5-manjaro-kde6 kernel: pci 0000:15:00.0: vgaarb: setting as boot VGA device (overriding previous)
Sep 02 20:32:31.689369 wizzy-am5-manjaro-kde6 kernel: pci 0000:15:00.0: vgaarb: bridge control possible
Sep 02 20:32:31.689432 wizzy-am5-manjaro-kde6 kernel: pci 0000:15:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
Sep 02 20:32:31.689438 wizzy-am5-manjaro-kde6 kernel: vgaarb: loaded
...
Sep 02 20:32:31.725515 wizzy-am5-manjaro-kde6 kernel: amdgpu 0000:15:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
...
Sep 02 20:32:31.725958 wizzy-am5-manjaro-kde6 kernel: amdgpu 0000:15:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
...
Sep 02 20:32:31.727036 wizzy-am5-manjaro-kde6 kernel: amdgpu 0000:15:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
...
Sep 02 20:32:32.329760 wizzy-am5-manjaro-kde6 kernel: nvidia: loading out-of-tree module taints kernel.
Sep 02 20:32:32.329773 wizzy-am5-manjaro-kde6 kernel: nvidia: module license 'NVIDIA' taints kernel.
Sep 02 20:32:32.329783 wizzy-am5-manjaro-kde6 kernel: Disabling lock debugging due to kernel taint
Sep 02 20:32:32.329796 wizzy-am5-manjaro-kde6 kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Sep 02 20:32:32.329810 wizzy-am5-manjaro-kde6 kernel: nvidia: module license taints kernel.
...
Sep 02 20:32:32.667118 wizzy-am5-manjaro-kde6 kernel: nvidia: unknown parameter 'modset' ignored
Sep 02 20:32:32.667127 wizzy-am5-manjaro-kde6 kernel: iwlwifi 0000:0e:00.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
Sep 02 20:32:32.667240 wizzy-am5-manjaro-kde6 kernel: thermal thermal_zone0: failed to read out thermal zone (-61)
Sep 02 20:32:32.667326 wizzy-am5-manjaro-kde6 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 508
Sep 02 20:32:32.667335 wizzy-am5-manjaro-kde6 kernel: NVRM: GPU 0000:01:00.0 is already bound to vfio-pci.
Sep 02 20:32:32.667347 wizzy-am5-manjaro-kde6 kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Sep 02 20:32:32.667356 wizzy-am5-manjaro-kde6 kernel: NVRM: This can occur when another driver was loaded and 
                                                      NVRM: obtained ownership of the NVIDIA device(s).
Sep 02 20:32:32.667366 wizzy-am5-manjaro-kde6 kernel: NVRM: Try unloading the conflicting kernel module (and/or
                                                      NVRM: reconfigure your kernel without the conflicting
                                                      NVRM: driver(s)), then try loading the NVIDIA kernel module
                                                      NVRM: again.
Sep 02 20:32:32.667374 wizzy-am5-manjaro-kde6 kernel: NVRM: No NVIDIA devices probed.
Sep 02 20:32:32.667382 wizzy-am5-manjaro-kde6 kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 508
...

I remember playing with passthrough - one of my original intents some years ago when I tested Nvidia.

If I recall correct one need find where the iommu grouping does not overlap. If the there is overlapping the group cannot be used for passthrough - again - my memory is close to non-existing.

I don’t remember any of the intricacies - nada, zilch, nothing - the only thing that reminds me is a script which list the iommu groups - I don’t even remember how I determined which one(s) to blacklist.

If the script can be of any use to you - please do use it

#!/bin/bash
shopt -s nullglob
for g in `find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V`; do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

My current system only has one GPU - I ran the script - but it will only make sense with a dual-gpu system.

But I don’t think the branding has any meaning nor the driver, what means something is that you can blacklist specific groups from being used by the host and make those groups available to the guest.

Thank you for your reply, however, groups are not the issue here, as it works fine without the cable attached and was working fine with the other monitor.
The problem is that it uses the dGPU as primary, no matter what. I am sure this comes from BIOS, but I am trying to solve it with software.

IOMMU Group 13:
        01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD103 [GeForce RTX 4080] [10de:2704] (rev a1)
        01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22bb] (rev a1)
IOMMU Group 14:
        02:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD E18 [2646:5013] (rev 01)

On systems with onboard GPU you should be able to select the primary GPU.

Usually that is set to auto - problably where you find the solution.

I have, and it is under multiple options in the BIOS. It completely ignores the setting. Even when trying to enter BIOS settings, they are displayed in the dGPU screen, and not the iGPU (while the cable is attached).

Short of updating the UEFI or BIOS firmware — which may or may not remedy that issue — there is nothing you can do. The firmware that controls this runs in a separate (and higher) privilege level of your processor(s) from the operating system.

I am on the latest non-Beta firmware.

So you are saying, I cannot disable/ignore my dGPU, in anyway using software?

That is correct. The assignment of the primary display happens before the operating system is even booted, depending on the connected hardware, and is managed by firmware — the UEFI — that runs in a higher privilege mode of your processor(s) than the kernel…

Ok, then how this guy here is able to attach/detach his GPU on demand, to play games in Linux and VM ?

Maybe you should ask him that then. :man_shrugging:

He is not part of the official Manjaro support, and he runs Fedora. Which, btw, I tried and I have the same behavior.

With the exception of the Manjaro developers, neither is anyone else here. We’re all volunteers — and that includes yours truly.

I can only share my own experiences with you from having tried a similar thing with the Xen hypervisor and Gentoo a long time ago.

If the behavior is the same in Fedora, then it’s clearly, as I said, a hardware issue. Perhaps your firmware does not support it while his does? :man_shrugging:

It was working with Kernels 5.x. There was a change in the implementation on 6.x and that is causing the issue. I know it should be taken care in the hardware side, but this “bug” appeared since Kernel 6.x. So, something must be implemented differently, therefore it should be corrected with some configuration.

Actually I did on his Discord server, and another guy was offered to help me.
However, since he didn´t know how to setup Manjaro, we did it on Fedora 40 KDE.

Turns out, it works flawlessly there.
So, thank you all for your help, I have to learn how to use Fedora now.

However, it is doable, although it is a hardware bug, so I would like to see a similar solution to Manjaro. Since AM5 is new platform and gets more popular day-by-day, I am sure more people will face the same issue.

1 Like

I checked the guides I am following for Manjaro and Fedora, and the main difference I see is between Dracut and Mkinitcpio.

When I have time, I will try to install Manjaro with Dracut and try the “Fedora way” to see where the problem is.