Hi, I think I tracked the cullprit of my issue, being the system frozing at shutdown (sometimes).
I think that the cullprit is the nouveau driver, coupled with GPU passthrough to qemu vm.
System spec
System:
Kernel: 5.8.6-1-MANJARO x86_64 bits: 64 compiler: N/A Console: tty 1
Distro: Manjaro Linux
Machine:
Type: Server Mobo: ASUSTeK model: Z9PE-D8 WS v: 1.0x serial:
BIOS: American Megatrends v: 5802 date: 06/10/2015
CPU:
Topology: 2x 8-Core model: 06/2d bits: 64 type: MCP SMP arch: Sandy Bridge
rev: 5 L2 cache: 40.0 MiB
flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 99759
Speed: 1258 MHz min/max: 1200/3800 MHz Core speeds (MHz): 1: 1208 2: 1208
3: 3111 4: 3111 5: 3111 6: 3111 7: 3111 8: 3111 9: 1205 10: 1204 11: 1204
12: 1204 13: 1228 14: 1204 15: 3111 16: 3111
Graphics:
Device-1: NVIDIA GK110B [GeForce GTX TITAN Black] driver: vfio-pci v: 0.2
bus ID: 83:00.0
Display: server: No display server data found. Headless machine?
tty: 80x24
Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:
Device-1: Intel C600/X79 series High Definition Audio vendor: ASUSTeK
driver: vfio-pci v: 0.2 bus ID: 00:1b.0
Device-2: NVIDIA GK110 High Definition Audio driver: vfio-pci v: 0.2
bus ID: 83:00.1
Sound Server: ALSA v: k5.8.6-1-MANJARO
Network:
Device-1: Intel 82574L Gigabit Network vendor: ASUSTeK driver: e1000e
v: 3.2.6-k port: 9000 bus ID: 06:00.0
IF: enp6s0 state: down mac:
Device-2: Intel 82574L Gigabit Network vendor: ASUSTeK driver: e1000e
v: 3.2.6-k port: 8000 bus ID: 07:00.0
IF: enp7s0 state: up speed: 100 Mbps duplex: full mac:
IF-ID-1: br0 state: up speed: 100 Mbps duplex: unknown mac:
IF-ID-2: br1 state: up speed: 10 Mbps duplex: unknown mac:
IF-ID-3: vnet0 state: unknown speed: 10 Mbps duplex: full mac:
IF-ID-4: vnet1 state: unknown speed: 10 Mbps duplex: full mac:
Drives:
Local Storage: total: 8.19 TiB used: 3.46 TiB (42.3%)
ID-1: /dev/sda vendor: SanDisk model: SDSSDP256G size: 238.47 GiB
ID-2: /dev/sdb vendor: Western Digital model: WD20EZRX-00DC0B0
size: 1.82 TiB
ID-3: /dev/sdc vendor: Crucial model: CT500MX500SSD1 size: 465.76 GiB
ID-4: /dev/sdd vendor: Western Digital model: WD60EZRX-00MVLB1
size: 5.46 TiB
ID-5: /dev/sde vendor: Hitachi model: HTS542525K9SA00 size: 232.89 GiB
Partition:
ID-1: / size: 227.74 GiB used: 12.97 GiB (5.7%) fs: ext4 dev: /dev/sde2
Swap:
ID-1: swap-1 type: file size: 8.00 GiB used: 0 KiB (0.0%) file: /swapfile
Sensors:
System Temperatures: cpu: 31.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info:
Processes: 291 Uptime: 2h 08m Memory: 62.86 GiB used: 32.87 GiB (52.3%)
Init: systemd Compilers: gcc: 10.2.0 Packages: 486 Shell: Bash v: 5.0.18
inxi: 3.1.05
DISTRIB_ID=ManjaroLinux
DISTRIB_RELEASE=20.1
DISTRIB_CODENAME=Mikah
DISTRIB_DESCRIPTION=“Manjaro Linux”
I’m using kernel 5.8.6-1 but the same happens with latest lts kernel.
The only gpu I have is a GTX Titan Black.
Qemu is installed in Manjaro and it automatically boots a vm with gpu passthrough (vbios passed also in libvirt), everything works as expected, the monitor switches from Manjaro to the vm.
I have a second vm, with gpu passthrough too.
If I poweroff the first vm and start the second vm, all seems to work as expected:
the monitor switches from the vm to Manjaro, and then from manjaro to the second vm.
Issue is that when I poweroff the vm, return back to manjaro terminal, and shutdown the workstation (shutdown -h now) it hangs and a forced shutdown (pressing the mechanical button) is required.
With recent updates I can see in the logs something like:
Log entry
nouveau 0000:83:00.0: disp: chid 0 stat 0000508c reason 5 [INVALID_STATE] mthd 008c data 00000000 code 0000102c
PRIORITY 3
SYSLOG_FACILITY 0
SYSLOG_IDENTIFIER kernel
_BOOT_ID 56576147c47d41da88091e82072eb0b0
_HOSTNAME tower
_KERNEL_DEVICE +pci:0000:83:00.0
_KERNEL_SUBSYSTEM pci
_MACHINE_ID 6fce4b6bea5640aaa0f1bb606eea2861
_SOURCE_MONOTONIC_TIMESTAMP 4694866141
_TRANSPORT kernel
_UDEV_SYSNAME 0000:83:00.0
__CURSOR s=c01c84adca0a406bb62d727731615fb0;i=8cb0a;b=56576147c47d41da88091e82072eb0b0;m=117d61217;t=5b032bf430ace;x=d20177e065b0a8e
__MONOTONIC_TIMESTAMP 4694872599
__REALTIME_TIMESTAMP 1601106887248590
This doesn’t seem to happen if I use only one vm, in other words, if I start manjaro, start the vm, poweroff the vm, shutdown manjaro.
Summarizing this seems to happen (and not always) only if I start manjaro, start the 1st vm, poweroff the 1st vm, start the 2nd vm, poweroff the 2nd vm, shutdown manjaro.
Anyone experiencing the same?