Hi,
I have problems with a freezing Win10 VM and kernel panics which sometimes result that the freezing VM is taking the host with it which only a reboot can resolve.
What I do:
I am using a Windows 10 VM and PCI passthrough for gaming.
System:
Ryzen 5 3600X
Gigabyte X570 AORUS ELITE - Bios Version F31 - 31.12.2020 d.m.Y
4x 8GB G.Skill Ripjaws 3600 C16 - In XMP profile 1
MSI GeForce RTX 2080 SUPER GAMING X TRIO - pci passthrough to guest
MSI GeForce GT 1030 - host
SupaGeek 5-Port-PCI USB-3 Card - passthrough to guest
Samsung SSD 970 EVO - 500GB
Samsung SSD 860 EVO - 1TB
Disk partitions
860 EVO - NTFs drive for the VM where all the games are on
970 EVO
- 512M UEFI Boot partition
- 390GB luks partition - Manjaro
- 75GB ntfs - Windows
Versions:
- Qemu - 5.2.0
- Virsh - 6.5.0
- Kernel - 5.10.7-3-MANJARO / 5.11.rc3-Manjaro
- VFIO-Guest - 0.1.190-1
- Windows 10 - 19042.746
- Win10 Nvidia driver - 460.89
What I also observed is that those panics are total random. Sometimes the VM runs 4-5mins until it freezes, yesterday I got >1H runtime without problems.
Sometimes there isn’t even any panic or output on dmesg or journalctl but the VM freezes complete (including those nasty sound buffer lock sounds)
I hope that I am in the right place here, if not it would be great if you can provide a direction where I should post this problem.
Cross post on reddit:
Kernel panic:
Jän 27 08:42:06 martin-x570aoruselite kernel: CR2: fffffffffffffff0 CR3: 00000006afb6a000 CR4: 0000000000350ee0
Jän 27 08:42:06 martin-x570aoruselite kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jän 27 08:42:06 martin-x570aoruselite kernel: FS: 0000000000000000(0053) GS:ffff89276ea80000(002b) knlGS:000000000024d000
Jän 27 08:42:06 martin-x570aoruselite kernel: R13: 00000000000ffe07 R14: 0000000000000006 R15: 0000000000000001
Jän 27 08:42:06 martin-x570aoruselite kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffe070d6
Jän 27 08:42:06 martin-x570aoruselite kernel: RBP: ffff8925ef82cc80 R08: 0000000100000000 R09: 0000000000000000
Jän 27 08:42:06 martin-x570aoruselite kernel: RDX: 00000000ffffffff RSI: ffff8925ef82dd38 RDI: ffffa33d41393c78
Jän 27 08:42:06 martin-x570aoruselite kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: fffffffffffffff0
Jän 27 08:42:06 martin-x570aoruselite kernel: RSP: 0018:ffffa33d41393c70 EFLAGS: 00010282
Jän 27 08:42:06 martin-x570aoruselite kernel: Code: 8b 40 10 48 81 c6 60 01 00 00 48 8d 48 f0 48 89 4f 20 48 39 c6 75 13 eb>
Jän 27 08:42:06 martin-x570aoruselite kernel: RIP: 0010:__mtrr_lookup_var_next+0x3b/0x90 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: ---[ end trace 40760db5febdbfe2 ]---
Jän 27 08:42:06 martin-x570aoruselite kernel: CR2: fffffffffffffff0
Jän 27 08:42:06 martin-x570aoruselite kernel: soundcore fb_sys_fops dca wmi pinctrl_amd mac_hid acpi_cpufreq nvidia(POE) s>
Jän 27 08:42:06 martin-x570aoruselite kernel: Modules linked in: vhost_net tun vhost vhost_iotlb macvtap macvlan tap rfcomm>
Jän 27 08:42:06 martin-x570aoruselite kernel: R13: 0000000000000006 R14: 00007fe058bce640 R15: 0000000000000000
Jän 27 08:42:06 martin-x570aoruselite kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Jän 27 08:42:06 martin-x570aoruselite kernel: RBP: 00005624b2bda840 R08: 00005624b0882b68 R09: 0000000000000038
Jän 27 08:42:06 martin-x570aoruselite kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000035
Jän 27 08:42:06 martin-x570aoruselite kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fe05b1c6f6b
Jän 27 08:42:06 martin-x570aoruselite kernel: RSP: 002b:00007fe058bcd608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jän 27 08:42:06 martin-x570aoruselite kernel: Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c>
Jän 27 08:42:06 martin-x570aoruselite kernel: RIP: 0033:0x7fe05b1c6f6b
Jän 27 08:42:06 martin-x570aoruselite kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jän 27 08:42:06 martin-x570aoruselite kernel: do_syscall_64+0x33/0x40
Jän 27 08:42:06 martin-x570aoruselite kernel: __x64_sys_ioctl+0x83/0xb0
Jän 27 08:42:06 martin-x570aoruselite kernel: kvm_vcpu_ioctl+0x25f/0x610 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: ? __wake_up_common+0x7a/0x140
Jän 27 08:42:06 martin-x570aoruselite kernel: ? pollwake+0x74/0x90
Jän 27 08:42:06 martin-x570aoruselite kernel: kvm_arch_vcpu_ioctl_run+0xca1/0x16a0 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: ? x86_virt_spec_ctrl+0xb3/0xe0
Jän 27 08:42:06 martin-x570aoruselite kernel: ? native_load_tr_desc+0x73/0x80
Jän 27 08:42:06 martin-x570aoruselite kernel: ? load_fixmap_gdt+0x32/0x40
Jän 27 08:42:06 martin-x570aoruselite kernel: ? __svm_vcpu_run+0x8b/0x110 [kvm_amd]
Jän 27 08:42:06 martin-x570aoruselite kernel: ? __svm_vcpu_run+0x97/0x110 [kvm_amd]
Jän 27 08:42:06 martin-x570aoruselite kernel: ? _raw_spin_unlock_irqrestore+0x20/0x40
Jän 27 08:42:06 martin-x570aoruselite kernel: kvm_mmu_page_fault+0x78/0x700 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: kvm_tdp_page_fault+0x33/0x90 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: kvm_mtrr_check_gfn_range_consistency+0xdd/0x130 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: Call Trace:
Jän 27 08:42:06 martin-x570aoruselite kernel: CR2: fffffffffffffff0 CR3: 00000006afb6a000 CR4: 0000000000350ee0
Jän 27 08:42:06 martin-x570aoruselite kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jän 27 08:42:06 martin-x570aoruselite kernel: FS: 0000000000000000(0053) GS:ffff89276ea80000(002b) knlGS:000000000024d000
Jän 27 08:42:06 martin-x570aoruselite kernel: R13: 00000000000ffe07 R14: 0000000000000006 R15: 0000000000000001
Jän 27 08:42:06 martin-x570aoruselite kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffe070d6
Jän 27 08:42:06 martin-x570aoruselite kernel: RBP: ffff8925ef82cc80 R08: 0000000100000000 R09: 0000000000000000
Jän 27 08:42:06 martin-x570aoruselite kernel: RDX: 00000000ffffffff RSI: ffff8925ef82dd38 RDI: ffffa33d41393c78
Jän 27 08:42:06 martin-x570aoruselite kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: fffffffffffffff0
Jän 27 08:42:06 martin-x570aoruselite kernel: RSP: 0018:ffffa33d41393c70 EFLAGS: 00010282
Jän 27 08:42:06 martin-x570aoruselite kernel: Code: 8b 40 10 48 81 c6 60 01 00 00 48 8d 48 f0 48 89 4f 20 48 39 c6 75 13 eb>
Jän 27 08:42:06 martin-x570aoruselite kernel: RIP: 0010:__mtrr_lookup_var_next+0x3b/0x90 [kvm]
Jän 27 08:42:06 martin-x570aoruselite kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELIT>
Jän 27 08:42:06 martin-x570aoruselite kernel: CPU: 2 PID: 30051 Comm: CPU 0/KVM Tainted: P W OE 5.11.0-1-MANJAR>
Jän 27 08:42:06 martin-x570aoruselite kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jän 27 08:42:06 martin-x570aoruselite kernel: PGD 736615067 P4D 736615067 PUD 736617067 PMD 0
Jän 27 08:42:06 martin-x570aoruselite kernel: #PF: error_code(0x0000) - not-present page
Jän 27 08:42:06 martin-x570aoruselite kernel: #PF: supervisor read access in kernel mode
Jän 27 08:42:06 martin-x570aoruselite kernel: BUG: unable to handle page fault for address: fffffffffffffff0
Here the KVM-XML
<domain type="kvm">
<name>win10</name>
<uuid>74c2e459-365b-49aa-8730-25cf3f7fa5bb</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit="KiB">16777216</memory>
<currentMemory unit="KiB">16777216</currentMemory>
<vcpu placement="static">8</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu="0" cpuset="2"/>
<vcpupin vcpu="1" cpuset="8"/>
<vcpupin vcpu="2" cpuset="3"/>
<vcpupin vcpu="3" cpuset="9"/>
<vcpupin vcpu="4" cpuset="4"/>
<vcpupin vcpu="5" cpuset="10"/>
<vcpupin vcpu="6" cpuset="5"/>
<vcpupin vcpu="7" cpuset="11"/>
<emulatorpin cpuset="0,6"/>
<iothreadpin iothread="1" cpuset="0-1,6-7"/>
</cputune>
<os>
<type arch="x86_64" machine="pc-q35-5.0">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vpindex state="on"/>
<synic state="on"/>
<stimer state="on"/>
<vendor_id state="on" value="0123456789ab"/>
<frequencies state="on"/>
</hyperv>
<kvm>
<hidden state="on"/>
</kvm>
<vmport state="off"/>
</features>
<cpu mode="host-model" check="none">
<topology sockets="1" dies="1" cores="4" threads="2"/>
<feature policy="require" name="topoext"/>
</cpu>
<clock offset="localtime">
<timer name="rtc" tickpolicy="catchup"/>
<timer name="pit" tickpolicy="delay"/>
<timer name="hpet" present="no"/>
<timer name="hypervclock" present="yes"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type="block" device="disk">
<driver name="qemu" type="raw" cache="none" io="native" discard="unmap" iothread="1" queues="8"/>
<source dev="/dev/disk/by-id/nvme-Samsung_SSD_970_EVO_500GB_S466NX0M758355V-part3"/>
<target dev="vda" bus="virtio"/>
<boot order="1"/>
<address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</disk>
<disk type="block" device="disk">
<driver name="qemu" type="raw" cache="none" io="native" discard="unmap" iothread="1" queues="8"/>
<source dev="/dev/disk/by-id/ata-Samsung_SSD_860_EVO_M.2_1TB_S415NB0M506678Z"/>
<target dev="vdb" bus="virtio"/>
<address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
</disk>
<controller type="usb" index="0" model="qemu-xhci" ports="15">
<address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x8"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="2" port="0x9"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0xa"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0xb"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x3"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0xc"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x4"/>
</controller>
<controller type="pci" index="6" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="6" port="0xd"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x5"/>
</controller>
<controller type="pci" index="7" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="7" port="0xe"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x6"/>
</controller>
<controller type="pci" index="8" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="8" port="0xf"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x7"/>
</controller>
<controller type="pci" index="9" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="9" port="0x10"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="10" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="10" port="0x11"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
</controller>
<controller type="pci" index="11" model="pcie-to-pci-bridge">
<model name="pcie-pci-bridge"/>
<address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
</controller>
<controller type="pci" index="12" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="12" port="0x12"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<interface type="direct">
<mac address="52:54:00:26:23:72"/>
<source dev="enp5s0" mode="bridge"/>
<model type="virtio"/>
<driver queues="8"/>
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x0a" slot="0x00" function="0x1"/>
</source>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x0a" slot="0x00" function="0x2"/>
</source>
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</hostdev>
<memballoon model="virtio">
<address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</memballoon>
</devices>
</domain>