[solved] AGESA 0.0.7.2 based BIOS update broke IOMMU redirection

I have a strange problem, and I've tested (switched) hardware in and out in the last couple of days to solve this. Unfortunately, I wasn't able to do so. :frowning_face: The problem in detail:
I cannot redirect the second video card via iommu after I updated my BIOS to the latest one based on AGESA 0.0.7.2 unfortunately, I cannot go back to the old one, because the BIOS cannot let me flash an older one.
The error when I start the VM is:Unknown PCI header type '127'
I've tried several kernels, none of them worked. I am on Manjaro stable right now.

I bumped into this during my search: AGESA 0.0.7.2 - PCI Quirk

More details can be found in the internet

Asrock AB350 Pro4 Downgrade
VFIO AB350 - Reddit
level1techs - solution
ASRock Forum

The solution seems to patch the kernel, with the following:
The kernel patch that does the trick

Can this patch be included in the Manjaro kernels? It seems AMD and the motherboard vendors will have a hard time (It will take some months as I believe) fixing this. I may try to patch the kernel myself for a test, but will I have permission to download the manjaro kernel, and build it myself?

Current config, I removed the second card, and disabled the iommu
System:    Host: dib-linux64 Kernel: 5.0.15-1-MANJARO x86_64 bits: 64 compiler: gcc v: 8.3.0 Console: tty 0 
           Distro: Manjaro Linux 
Machine:   Type: Desktop Mobo: ASRock model: B450M Pro4 serial: M80-BC016300977 UEFI [Legacy]: American Megatrends v: P3.10 
           date: 03/07/2019 
CPU:       Topology: 6-Core model: AMD Ryzen 5 1600 bits: 64 type: MT MCP arch: Zen rev: 1 L2 cache: 3072 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 76671 
           Speed: 1956 MHz min/max: 1550/3200 MHz Core speeds (MHz): 1: 1956 2: 2478 3: 3261 4: 2226 5: 1450 6: 1390 7: 2066 
           8: 1581 9: 1876 10: 2635 11: 1567 12: 1627 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] 
           vendor: Sapphire Limited Nitro+ driver: amdgpu v: kernel bus ID: 1c:00.0 
           Display: server: X.Org 1.20.4 driver: amdgpu FAILED: ati unloaded: modesetting resolution: 1920x1080~60Hz 
           OpenGL: renderer: Radeon RX 580 Series (POLARIS10 DRM 3.27.0 5.0.15-1-MANJARO LLVM 8.0.0) v: 4.5 Mesa 19.0.4 
           direct render: Yes 
Audio:     Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: Sapphire Limited driver: snd_hda_intel 
           v: kernel bus ID: 1c:00.1 
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: ASRock driver: snd_hda_intel v: kernel 
           bus ID: 1e:00.3 
           Device-3: C-Media CM108 Audio Controller type: USB driver: hid-generic,snd-usb-audio,usbhid bus ID: 3-3:3 
           Device-4: Microsoft type: USB driver: snd-usb-audio,uvcvideo bus ID: 1-7:4 
           Sound Server: ALSA v: k5.0.15-1-MANJARO 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASRock driver: r8169 v: kernel port: f000 
           bus ID: 18:00.0 
           IF: enp24s0 state: up speed: 1000 Mbps duplex: full mac: 70:85:c2:b7:aa:3d 
Drives:    Local Storage: total: 2.27 TiB used: 851.63 GiB (36.6%) 
           ID-1: /dev/sda vendor: Western Digital model: WD10EZRX-00L4HB0 size: 931.51 GiB 
           ID-2: /dev/sdb vendor: Western Digital model: WD10JPVX-00JC3T0 size: 931.51 GiB 
           ID-3: /dev/sdc vendor: Seagate model: ST500LT012-1DG142 size: 465.76 GiB 
RAID:      Device-1: data type: zfs status: ONLINE raid: no-raid size: 856.00 GiB free: 32.30 GiB Components: online: N/A 
Partition: ID-1: / size: 65.00 GiB used: 27.95 GiB (43.0%) fs: btrfs dev: /dev/dm-0 
           ID-2: /home size: 65.00 GiB used: 27.95 GiB (43.0%) fs: btrfs dev: /dev/dm-0 
           ID-3: swap-1 size: 6.00 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/dm-1 
Sensors:   System Temperatures: cpu: 39.4 C mobo: 34.0 C gpu: amdgpu temp: 36 C 
           Fan Speeds (RPM): fan-1: 1210 fan-2: 1812 fan-3: 499 fan-4: 0 fan-5: 0 gpu: amdgpu fan: 766 
           Voltages: 12v: N/A 5v: N/A 3.3v: 3.33 vbat: 3.25 
Info:      Processes: 397 Uptime: 19h 17m Memory: 15.59 GiB used: 5.72 GiB (36.7%) Init: systemd Compilers: gcc: 8.3.0 
           Shell: bash v: 5.0.7 inxi: 3.0.34 

Thanks in advance!

Have you tried 5.1 kernel already or just the 5.0 kernel?
Also, have a look at the new update that was just released for stable, install the packages, reboot and then try the 5.1 kernel.

I updated the BIOS on my work PC (https://www.gigabyte.com/us/Motherboard/GA-AB350-Gaming-rev-1x#support-dl-bios) from F25 to F30/F31 and one of the items is "Update AGESA 0.0.7.2".

The BIOS update broke the onboard LAN so I wonder if there are issues with AGESA 0.0.7.2.

Edit:

Yup:

Looks like it's worth avoiding 0.0.7.2 for now.

1 Like

oh joy, MSI use completely different numbering for AGESA versions, this was the last AGESA update for mine :man_shrugging:

Update AGESA Code 1.0.0.6

EDIT - nevermind I just read the download page for your Gigabyte board. So the newer AGESA has a lower version number, thanks AMD for keeping it stupid, not simple

1 Like

Apparently AGESA 1.0.0.* is for 15h/Zen/Zen+ and AGESA 0.0.7.* adds support for Zen 2.

Confused? Yes.

I've tested with kernel 5.1, 5.0, 4.20, 4.19,4.14 in the last couple of days, I even installed an Ubuntu 18.10 for testing, but the error was the same: PCI header type '127'
But I will try it again (after the update) with the 5.1 kernel. I will report back when I finished testing...

Totally agree. I wish I've double checked the forums before the update. I even missed the info that I cannot go back from this. :frowning:

I updated the system, and I've tested almost everything except compiling the kernel with the patch included...

My Asrock AB350M Pro4 mobo had some issues, I thought it is unstable so I bought an Asrock B450M Pro4.
Tried out all of the available BIOSes, but the older ones lack the feature to init the second PCIE slot first, and the newer ones are giving back the same error I posted already. :frowning:
With the 5.1 kernel I cannot import my zfs pools using the cache files because of the missing zfs modules for the 5.1 kernel:

sudo systemctl status zfs-import-cache.service
May 28 00:24:47 dib-linux64 systemd[1]: Starting Import ZFS pools by cache file...
May 28 00:24:57 dib-linux64 zpool[2196]: /dev/zfs and /proc/self/mounts are required.
May 28 00:24:57 dib-linux64 zpool[2196]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
May 28 00:24:57 dib-linux64 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
May 28 00:24:57 dib-linux64 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
May 28 00:24:57 dib-linux64 systemd[1]: Failed to start Import ZFS pools by cache file.

But I can live without it for the test. The vfio output before start:
sudo dmesg | grep vfio
[   42.084862] vfio-pci 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[   42.100448] vfio_pci: add [1002:67df[ffffffff:ffffffff]] class 0x000000/00000000
[   42.117200] vfio_pci: add [1002:aaf0[ffffffff:ffffffff]] class 0x000000/00000000
[   42.117211] vfio_pci: add [1002:665f[ffffffff:ffffffff]] class 0x000000/00000000
[   42.117215] vfio_pci: add [1002:aac0[ffffffff:ffffffff]] class 0x000000/00000000
[   84.557711] vfio-pci 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
The iommu output before the start:
sudo dmesg | grep iommu                                                                                                                                                     
[    0.000000] Command line: BOOT_IMAGE=/@/boot/vmlinuz-5.1-x86_64 root=UUID=18b5f730-d4ec-4883-9389-8d068a43cc84 rw rootflags=subvol=@ cryptdevice=UUID=b97fc04b-8fc5-4ca1-9bde-1db90d5aad72:cryptroot iommu=1 iommu=pt video=efifb:off amd_iommu=fullflush quiet rd.udev.log-priority=3
[    0.000000] Kernel command line: BOOT_IMAGE=/@/boot/vmlinuz-5.1-x86_64 root=UUID=18b5f730-d4ec-4883-9389-8d068a43cc84 rw rootflags=subvol=@ cryptdevice=UUID=b97fc04b-8fc5-4ca1-9bde-1db90d5aad72:cryptroot iommu=1 iommu=pt video=efifb:off amd_iommu=fullflush quiet rd.udev.log-priority=3
[    0.715282] pci 0000:00:01.0: Adding to iommu group 0
[    0.715309] pci 0000:00:01.0: Using iommu direct mapping
[    0.715410] pci 0000:00:01.3: Adding to iommu group 1
[    0.715432] pci 0000:00:01.3: Using iommu direct mapping
[    0.715527] pci 0000:00:02.0: Adding to iommu group 2
[    0.715549] pci 0000:00:02.0: Using iommu direct mapping
[    0.715643] pci 0000:00:03.0: Adding to iommu group 3
[    0.715661] pci 0000:00:03.0: Using iommu direct mapping
[    0.715742] pci 0000:00:03.1: Adding to iommu group 4
[    0.715764] pci 0000:00:03.1: Using iommu direct mapping
[    0.715852] pci 0000:00:04.0: Adding to iommu group 5
[    0.715870] pci 0000:00:04.0: Using iommu direct mapping
[    0.715949] pci 0000:00:07.0: Adding to iommu group 6
[    0.715971] pci 0000:00:07.0: Using iommu direct mapping
[    0.716058] pci 0000:00:07.1: Adding to iommu group 7
[    0.716079] pci 0000:00:07.1: Using iommu direct mapping
[    0.716169] pci 0000:00:08.0: Adding to iommu group 8
[    0.716189] pci 0000:00:08.0: Using iommu direct mapping
[    0.716278] pci 0000:00:08.1: Adding to iommu group 9
[    0.716299] pci 0000:00:08.1: Using iommu direct mapping
[    0.716388] pci 0000:00:14.0: Adding to iommu group 10
[    0.716409] pci 0000:00:14.0: Using iommu direct mapping
[    0.716422] pci 0000:00:14.3: Adding to iommu group 10
[    0.716528] pci 0000:00:18.0: Adding to iommu group 11
[    0.716549] pci 0000:00:18.0: Using iommu direct mapping
[    0.716561] pci 0000:00:18.1: Adding to iommu group 11
[    0.716574] pci 0000:00:18.2: Adding to iommu group 11
[    0.716584] pci 0000:00:18.3: Adding to iommu group 11
[    0.716595] pci 0000:00:18.4: Adding to iommu group 11
[    0.716607] pci 0000:00:18.5: Adding to iommu group 11
[    0.716619] pci 0000:00:18.6: Adding to iommu group 11
[    0.716630] pci 0000:00:18.7: Adding to iommu group 11
[    0.716724] pci 0000:01:00.0: Adding to iommu group 12
[    0.716743] pci 0000:01:00.0: Using iommu direct mapping
[    0.716762] pci 0000:01:00.1: Adding to iommu group 12
[    0.716780] pci 0000:01:00.2: Adding to iommu group 12
[    0.716790] pci 0000:02:00.0: Adding to iommu group 12
[    0.716799] pci 0000:02:01.0: Adding to iommu group 12
[    0.716809] pci 0000:02:04.0: Adding to iommu group 12
[    0.716827] pci 0000:04:00.0: Adding to iommu group 12
[    0.716839] pci 0000:05:00.0: Adding to iommu group 12
[    0.716852] pci 0000:05:00.1: Adding to iommu group 12
[    0.716973] pci 0000:06:00.0: Adding to iommu group 13
[    0.717001] pci 0000:06:00.0: Using iommu direct mapping
[    0.717033] pci 0000:06:00.1: Adding to iommu group 13
[    0.717121] pci 0000:07:00.0: Adding to iommu group 14
[    0.717140] pci 0000:07:00.0: Using iommu direct mapping
[    0.717224] pci 0000:07:00.2: Adding to iommu group 15
[    0.717242] pci 0000:07:00.2: Using iommu direct mapping
[    0.717329] pci 0000:07:00.3: Adding to iommu group 16
[    0.717348] pci 0000:07:00.3: Using iommu direct mapping
[    0.717434] pci 0000:08:00.0: Adding to iommu group 17
[    0.717456] pci 0000:08:00.0: Using iommu direct mapping
[    0.717546] pci 0000:08:00.2: Adding to iommu group 18
[    0.717568] pci 0000:08:00.2: Using iommu direct mapping
[    0.717667] pci 0000:08:00.3: Adding to iommu group 19
[    0.717689] pci 0000:08:00.3: Using iommu direct mapping
[    0.718291] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
After the unsuccessful start of the VM (rom bars disabled):
Virtual Machine Manager 2.1.0 error on start:
Error starting domain: internal error: Unknown PCI header type '127'

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1420, in startup
    self._backend.create()
  File "/usr/lib/python3.7/site-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: Unknown PCI header type '127'
VFIO log looks like this:
sudo dmesg | grep vfio
[  830.996442] vfio-pci 0000:06:00.0: enabling device (0000 -> 0003)
[  830.996924] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x19@0x270
[  830.996935] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1b@0x2d0
[  830.996943] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1e@0x370
And iommu that:
sudo dmesg | grep iommu
[  833.376849] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e191bc0]
[  834.379106] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e191c00]
[  834.379109] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e191c20]
[  835.380966] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e191c90]

All of this was with kernel 5.1

With kernel 5.0 (and with my pools imported successfully and with ROM Bar loaded from my ZFS drive):

Virtual Machine Manager 2.1.0 error at start:
Error starting domain:
internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1455, in resume
    self._backend.resume()
  File "/usr/lib/python3.7/site-packages/libvirt.py", line 2012, in resume
    if ret == -1: raise libvirtError ('virDomainResume() failed', dom=self)
libvirt.libvirtError: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required
IOMMU after start:
sudo dmesg | grep iommu
[  167.034572] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18de60]
[  168.036816] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18de90]
[  169.038354] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18dec0]
[  170.040624] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18def0]
[  171.042454] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18df20]
[  172.043973] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18df50]
[  173.046225] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18df80]
[  174.047785] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18dfb0]
[  175.049625] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18dfe0]
[  176.051869] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18c010]
[  177.053398] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18c040]
[  178.055260] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18c070]
[  179.057173] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x42e18c0a0]
vfio after start:
sudo dmesg | grep vfio
[  164.734132] vfio-pci 0000:06:00.0: enabling device (0000 -> 0003)
[  164.734537] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x19@0x270
[  164.734547] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1b@0x2d0
[  164.734554] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1e@0x370
[  166.011316] vfio_bar_restore: 0000:06:00.1 reset recovery - restoring bars
[  166.027868] vfio_bar_restore: 0000:06:00.0 reset recovery - restoring bars

I used the BIOS with AGESA 0.0.7.2 (v3.1), and the new one with AMD Combo_PI 1.0.0.1 (v3.3 - 2019/5/21) also from Asrock website.

So it seems there is one option left: test with a kernel built by myself, with the patch included... :slight_smile:
I've found this: building-a-custom-kernel-in-manjaro-linux is this still relevant? Will I be able to download from https://gitlab.manjaro.org/packages/core/linux51 if I create a gitlab account?

Ok, I've successfully built the 5.1 Manjaro kernel with the patch, and I can confirm that my vfio video card passthrough works again without a hassle. :slight_smile: :stuck_out_tongue_winking_eye::partying_face:

The commands I've used:
git clone https://gitlab.manjaro.org/packages/core/linux51.git
cd linux51
wget -O pci.patch https://clbin.com/VCiYJ
cp PKGBUILD _PKGBUILD
sed -i "/'0013-bootsplash.patch')/c\\\t'0013-bootsplash.patch'\\n\\t'pci.patch')" PKGBUILD
sed -i '107i\  patch -Np1 -i "${srcdir}/pci.patch"\' PKGBUILD
updpkgsums
makepkg -s
sudo pacman -U ./linux51-5.1.5-1-x86_64.pkg.tar.xz ./linux51-headers-5.1.5-1-x86_64.pkg.tar.xz

Now I'm doing the same with the 5.0, because my ZFS pools are loaded with that kernel. :sunglasses:

2 Likes

Just in case it's useful, ZFS 0.8.0 is in unstable (and maybe testing) which adds support for 5.1.

1 Like

Thanks! I will try it out if I had enough sleep. :slight_smile: I had to much "uptime" last night :smiley:

By the way, 5.0 with the patch also woks like a charm...
git clone https://gitlab.manjaro.org/packages/core/linux50.git
cd linux50
wget -O pci.patch https://clbin.com/VCiYJ
sed -i "/'0013-bootsplash.patch')/c\\\t'0013-bootsplash.patch'\\n\\t'pci.patch')" PKGBUILD 
sed -i '137i\  patch -Np1 -i "${srcdir}/pci.patch"\' PKGBUILD
updpkgsums
makepkg -s
sudo pacman -U ./linux50-5.0.19-1-x86_64.pkg.tar.xz  ./linux50-headers-5.0.19-1-x86_64.pkg.tar.xz 

I'm not sure when this patch will reach mainline, can it be included into Manjaro kernels until that?
If it helps I can test the patched kernel on one Intel laptop, because all my other computers are AMD at hart :slight_smile: I can report back with my findings in the afternoon/evening/night.

(Most probably the x570 and the newer boards will be shipped with AGESA 0.0.7.2 or AMD Combo_PI 1.0.0.1, and the older boards will receive the updated BIOS. So this will be a problem sooner or later...)

about cache zfs

Forum kindly sponsored by Bytemark