Nouveau driver issue (qemu+GPU passthrough)

Hi, I think I tracked the cullprit of my issue, being the system frozing at shutdown (sometimes).
I think that the cullprit is the nouveau driver, coupled with GPU passthrough to qemu vm.

System spec

System:

Kernel: 5.8.6-1-MANJARO x86_64 bits: 64 compiler: N/A Console: tty 1

Distro: Manjaro Linux

Machine:

Type: Server Mobo: ASUSTeK model: Z9PE-D8 WS v: 1.0x serial:

BIOS: American Megatrends v: 5802 date: 06/10/2015

CPU:

Topology: 2x 8-Core model: 06/2d bits: 64 type: MCP SMP arch: Sandy Bridge

rev: 5 L2 cache: 40.0 MiB

flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 99759

Speed: 1258 MHz min/max: 1200/3800 MHz Core speeds (MHz): 1: 1208 2: 1208

3: 3111 4: 3111 5: 3111 6: 3111 7: 3111 8: 3111 9: 1205 10: 1204 11: 1204

12: 1204 13: 1228 14: 1204 15: 3111 16: 3111

Graphics:

Device-1: NVIDIA GK110B [GeForce GTX TITAN Black] driver: vfio-pci v: 0.2

bus ID: 83:00.0

Display: server: No display server data found. Headless machine?

tty: 80x24

Message: Unable to show advanced data. Required tool glxinfo missing.

Audio:

Device-1: Intel C600/X79 series High Definition Audio vendor: ASUSTeK

driver: vfio-pci v: 0.2 bus ID: 00:1b.0

Device-2: NVIDIA GK110 High Definition Audio driver: vfio-pci v: 0.2

bus ID: 83:00.1

Sound Server: ALSA v: k5.8.6-1-MANJARO

Network:

Device-1: Intel 82574L Gigabit Network vendor: ASUSTeK driver: e1000e

v: 3.2.6-k port: 9000 bus ID: 06:00.0

IF: enp6s0 state: down mac:

Device-2: Intel 82574L Gigabit Network vendor: ASUSTeK driver: e1000e

v: 3.2.6-k port: 8000 bus ID: 07:00.0

IF: enp7s0 state: up speed: 100 Mbps duplex: full mac:

IF-ID-1: br0 state: up speed: 100 Mbps duplex: unknown mac:

IF-ID-2: br1 state: up speed: 10 Mbps duplex: unknown mac:

IF-ID-3: vnet0 state: unknown speed: 10 Mbps duplex: full mac:

IF-ID-4: vnet1 state: unknown speed: 10 Mbps duplex: full mac:

Drives:

Local Storage: total: 8.19 TiB used: 3.46 TiB (42.3%)

ID-1: /dev/sda vendor: SanDisk model: SDSSDP256G size: 238.47 GiB

ID-2: /dev/sdb vendor: Western Digital model: WD20EZRX-00DC0B0

size: 1.82 TiB

ID-3: /dev/sdc vendor: Crucial model: CT500MX500SSD1 size: 465.76 GiB

ID-4: /dev/sdd vendor: Western Digital model: WD60EZRX-00MVLB1

size: 5.46 TiB

ID-5: /dev/sde vendor: Hitachi model: HTS542525K9SA00 size: 232.89 GiB

Partition:

ID-1: / size: 227.74 GiB used: 12.97 GiB (5.7%) fs: ext4 dev: /dev/sde2

Swap:

ID-1: swap-1 type: file size: 8.00 GiB used: 0 KiB (0.0%) file: /swapfile

Sensors:

System Temperatures: cpu: 31.0 C mobo: N/A

Fan Speeds (RPM): N/A

Info:

Processes: 291 Uptime: 2h 08m Memory: 62.86 GiB used: 32.87 GiB (52.3%)

Init: systemd Compilers: gcc: 10.2.0 Packages: 486 Shell: Bash v: 5.0.18

inxi: 3.1.05

DISTRIB_ID=ManjaroLinux
DISTRIB_RELEASE=20.1
DISTRIB_CODENAME=Mikah
DISTRIB_DESCRIPTION=“Manjaro Linux”

I’m using kernel 5.8.6-1 but the same happens with latest lts kernel.
The only gpu I have is a GTX Titan Black.
Qemu is installed in Manjaro and it automatically boots a vm with gpu passthrough (vbios passed also in libvirt), everything works as expected, the monitor switches from Manjaro to the vm.
I have a second vm, with gpu passthrough too.
If I poweroff the first vm and start the second vm, all seems to work as expected:
the monitor switches from the vm to Manjaro, and then from manjaro to the second vm.
Issue is that when I poweroff the vm, return back to manjaro terminal, and shutdown the workstation (shutdown -h now) it hangs and a forced shutdown (pressing the mechanical button) is required.
With recent updates I can see in the logs something like:

Log entry

nouveau 0000:83:00.0: disp: chid 0 stat 0000508c reason 5 [INVALID_STATE] mthd 008c data 00000000 code 0000102c
PRIORITY 3
SYSLOG_FACILITY 0
SYSLOG_IDENTIFIER kernel
_BOOT_ID 56576147c47d41da88091e82072eb0b0
_HOSTNAME tower
_KERNEL_DEVICE +pci:0000:83:00.0
_KERNEL_SUBSYSTEM pci
_MACHINE_ID 6fce4b6bea5640aaa0f1bb606eea2861
_SOURCE_MONOTONIC_TIMESTAMP 4694866141
_TRANSPORT kernel
_UDEV_SYSNAME 0000:83:00.0
__CURSOR s=c01c84adca0a406bb62d727731615fb0;i=8cb0a;b=56576147c47d41da88091e82072eb0b0;m=117d61217;t=5b032bf430ace;x=d20177e065b0a8e
__MONOTONIC_TIMESTAMP 4694872599
__REALTIME_TIMESTAMP 1601106887248590

This doesn’t seem to happen if I use only one vm, in other words, if I start manjaro, start the vm, poweroff the vm, shutdown manjaro.

Summarizing this seems to happen (and not always) only if I start manjaro, start the 1st vm, poweroff the 1st vm, start the 2nd vm, poweroff the 2nd vm, shutdown manjaro.

Anyone experiencing the same?

Hello,

And you want to get it passthrough to qemu. With only one GPU in a machine is not possible to have it for host and passthrough to vm.

Warning: Once you reboot after this procedure, whatever GPU you have configured will no longer be usable on the host until you reverse the manipulation. Make sure the GPU you intend to use on the host is properly configured before doing this - your motherboard should be set to display using the host GPU.

A quote from here:
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Isolating_the_GPU

1 Like

Thank you, yes I was aware of this, I actually feel quite lucky with my gpu and its behaviour in manjaro.
Usually I use only one vm and despite what you quoted and what is described in the wiki the gpu works well for my needs.
With other linux distros I was not able to have back the gpu from the vm, as described as a possibility in the wiki.

But would not be easier then to have a dual boot? At this point, once you want to use the GPU in the VM the host is not able to use it, or you connect remotely to your host to control it?

Unfortunately not, since the vm is macOS and managing it through a vm is a lot way easier and more stable, so that was my choice.

Yes, to manage the host (manjaro) I can ssh into it from the vm (99% of the time I do this), or use cockpit, or I can power off the vm and my gpu switches to the host with “no big” issues (apart the described issue that sometimes the host hangs at poweroff).

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.