Random kernel panics with ASUS PCE-AC51

Hi everyone,

yesterday my new PCI wireless adapter arrived, and I was hoping I could replace the slow usb adapter I’ve been using with it. However, ever since installing it the system has been unstable to the point of being unusable.

I’ve tried running Windows for a while and it appears to be completely stable on the same setup.

Some information:

Kernels tried: 5.7 and 5.8
Ryzen 3900x
ROF-Strix x570-Gaming Bios v2.20.1271 (Latest)
Adapter is PCE-AC51, which has the rtl8812ae chipset.
I’ve updated my BIOS to the latest version, but cannot try a different PCI port since the graphics cards are blocking access.

At first the system wouldn’t even boot past the login manager (it doesn’t matter whether I logged into i3, sway, or a shell), panicking before I could even get any information out. However, after disabling C-State Control in the BIOS, I was able to get the system a little bit more stable, being able to boot into a WM and even being able to use the PCI Card, which is recognized out of the box.

However, after random intervals ranging from 5 minutes to about 2 hours something breaks and the system becomes less responsive, shortly afterward the kernel panics and the system resets.

I cannot get you any specific dmesg logs, since the system fails to flush anything to disk before resetting, however, the last time I was able to remember some stuff from the dmesg --follow i had running on a screen;

It started off with a pci link lost, then a ath: phy1: failed to wakeup, then a xhci_hcd: xHCI host controller not responding, then a couple of ata: exceptions then ahci: AHCI Controller unavailable, USB disconnects,

Sometimes the audio/video stutters as well, and while the wifi doesn’t disconnect, sometimes I cannot get any packets through for a little while until it is back up. The latter seems to be a more general issue with the chipset however, and should be fixable when the system is in a usable state.

This feels like it is a broader PCI issue, since usb controllers and graphics card audio are having troubles as well.

Any ideas? My BIOS settings are setup for GPU Passthrough, so virtualization settings are all turned on, including for PCI devices.

Edit:

This means:

  • SR-IOV Support is enabled
  • CPU Virtualization is enabled

Thanks!

I did some more testing, and was able to use the PC for 3 hours time after removing pci=nomsi from the kernel parameters, all on my 2.4Ghz network. However, when I tried connecting to the 5Ghz network, the system crashed again.

Managed to capture dmesg logs before the final panic:

[ 1353.652722] wlp3s0: deauthenticated from 3c:a6:2f:57:30:b2 (Reason: 6=CLASS2_FRAME_FROM_NONAUTH_STA)
[ 1354.013257] wlp3s0: authenticate with 3c:a6:2f:57:30:b2
[ 1354.041203] wlp3s0: send auth to 3c:a6:2f:57:30:b2 (try 1/3)
[ 1354.141593] wlp3s0: send auth to 3c:a6:2f:57:30:b2 (try 2/3)
[ 1354.248265] wlp3s0: send auth to 3c:a6:2f:57:30:b2 (try 3/3)
[ 1354.351594] wlp3s0: authentication with 3c:a6:2f:57:30:b2 timed out
[ 1358.742934] wlp3s0: authenticate with 3c:a6:2f:57:30:b2
[ 1358.773953] wlp3s0: send auth to 3c:a6:2f:57:30:b2 (try 1/3)
[ 1358.777601] wlp3s0: authenticated
[ 1358.778266] wlp3s0: associate with 3c:a6:2f:57:30:b2 (try 1/3)
[ 1358.782211] wlp3s0: RX AssocResp from 3c:a6:2f:57:30:b2 (capab=0x1511 status=0 aid=2)
[ 1358.796597] wlp3s0: associated
[ 1358.938649] wlp3s0: Limiting TX power to 27 (30 - 3) dBm as advertised by 3c:a6:2f:57:30:b2
[ 1362.830118] wlp3s0: disassociated from 3c:a6:2f:57:30:b2 (Reason: 2=PREV_AUTH_NOT_VALID)
[ 1362.987459] audit: type=1130 audit(1597862784.516:337): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1363.484788] audit: type=1130 audit(1597862785.012:338): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=udisks2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1363.753152] audit: type=1334 audit(1597862785.282:339): prog-id=20 op=LOAD
[ 1363.753187] audit: type=1334 audit(1597862785.282:340): prog-id=21 op=LOAD
[ 1363.754850] audit: type=1325 audit(1597862785.282:341): table=filter family=7 entries=0 op=register pid=7599 subj==unconfined comm="(ostnamed)"
[ 1363.931154] audit: type=1130 audit(1597862785.459:342): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1373.326900] audit: type=1131 audit(1597862794.856:343): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1393.980209] audit: type=1131 audit(1597862815.509:344): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1394.005080] audit: type=1334 audit(1597862815.536:345): prog-id=21 op=UNLOAD
[ 1394.005084] audit: type=1334 audit(1597862815.536:346): prog-id=20 op=UNLOAD
[ 1394.180641] audit: type=1325 audit(1597862815.709:347): table=filter family=7 entries=0 op=unregister pid=1719 subj==unconfined comm="kworker/u64:15"
[ 1485.700375] audit: type=1111 audit(1597862907.229:348): pid=1130 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=device-disconnect interface="wlp3s0" ifindex=3 pid=1451 uid=1000 result=fail exe="/usr/bin/NetworkManager" hostname=? addr=? terminal=? res=failed'
[ 1485.700412] audit: type=1111 audit(1597862907.229:349): pid=1130 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=device-disconnect interface="wlp3s0" ifindex=3 pid=1451 uid=1000 result=fail exe="/usr/bin/NetworkManager" hostname=? addr=? terminal=? res=failed'
[ 1490.240887] igb 0000:04:00.0 enp4s0: PCIe link lost
[ 1490.644287] xhci_hcd 0000:05:00.1: xHCI host controller not responding, assume dead
[ 1490.644308] xhci_hcd 0000:05:00.1: HC died; cleaning up
[ 1490.644394] usb 1-1: USB disconnect, device number 2
[ 1491.461382] ath: phy1: Failed to wakeup in 500us
[ 1491.471542] ath: phy1: Failed to wakeup in 500us
[ 1491.854637] xhci_hcd 0000:05:00.3: xHCI host controller not responding, assume dead
[ 1491.854648] xhci_hcd 0000:05:00.3: HC died; cleaning up
[ 1491.854685] usb 3-1: USB disconnect, device number 2
[ 1492.459854] ata5.00: exception Emask 0x73 SAct 0x8 SErr 0xffffffff action 0xe frozen
[ 1492.459856] ata5.00: irq_stat 0xffbfffff, unknown FIS 00000000 00000000 00000000 00000000, host bus
[ 1492.459858] ata5: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch }
[ 1492.459860] ata5.00: failed command: WRITE FPDMA QUEUED
[ 1492.459863] ata5.00: cmd 61/38:18:10:a1:eb/00:00:0b:00:00/40 tag 3 ncq dma 28672 out
                        res 40/00:18:10:a1:eb/00:00:0b:00:00/40 Emask 0x72 (host bus error)
[ 1492.459864] ata5.00: status: { DRDY }
[ 1492.459867] ata5: hard resetting link
[ 1492.661583] ahci 0000:06:00.0: AHCI controller unavailable!

I’ve now also tried disabling the n standard altogether with a modprobe options iwlwifi 11n_disable=$num config, where I tried $num with 1, 8 and 12. 8 and 12 don’t solve the problem. 1 seems to be more stable, but ipv4 stopped working.

I’ll now try 1 while also disabling ipv6 in sysctl.d

I’ve been able to narrow the issue down to the 5Ghz mode of the driver, at least I think. Unfortunately the drivers on the official asus website only work up to kernel v4.14.

With 11n completely disabled and ipv6 disabled it is somewhat stable, but when booting I still get mce hardware errors and sometimes it panics during boot. However, if the boot goes through, everything seems to work fine.