All kernel after 5.4 crash on me after suspend/sleep

I did try again in the newst kernal by just switching to tty2 login as root and use sysemctrl suspend
then with the current working kernel to compare the logs in a diff tool.

I think I found the issue.

sedutil-cli[1566]: One or more header fields have 0 length
sedutil-cli[1566]: EndSession Failed
sedutil-cli[1566]: Unable to authenticate with the given password
sedutil-cli[1614]: You do not have permission to access the raw disk in write mode
sedutil-cli[1614]: Perhaps you might try sudo to run as root
sedutil-cli[1614]: Invalid or unsupported disk /dev/nvme1n1
systemd[1]: Finished Permit User Sessions.
sedutil-cli[1616]: You do not have permission to access the raw disk in write mode
sedutil-cli[1616]: Perhaps you might try sudo to run as root
sedutil-cli[1616]: Invalid or unsupported disk /dev/nvme1n1

As I mentioned, I am using self encrypting opal drives (Samsung EVO NVME) and for S3 to work I need to tell the kernel the password to unlock it after suspend, seems like something is broken there with newer kernels. I guess I will have to dig into it again and update the way to set this password :-/

pdate: Good that I wrote a tutorial on it back then, Should help me now :slight_smile: Enable S3 sleep mode for OPAL encrypted NVMe drives - Tutorials - Manjaro Linux Forum

Unfortunately this was not it. I found the same statements, just at another place in the old kernel log and I tested writing to the disk after suspend works for the new kernel. So back to square one :disappointed:

I tried to boot the new kernel in single user mode, by adding “single” grub, and there I could suspend and unsuspend multiple times without triggering this error.

Then I tried to do the same with booting to multi-user.target and there it seems to first work, but then on the 2nd resume from suspend I again got the same Kernel error as stated above…

Also in my comparison between the old an new kernel, only the sudden kernel error sticks out to me now…

Reading the error stack I feel like this is related to the Network. This is enforced by the fact that after resume all networking is dead (but soon after all system is dead, so…). Unfortunately I can not disable the onboard network ports. Detaching the cable was not helping.

Maybe virtualbox is the problem: The VirtualBox Kernel Driver Is Tainted Crap - Phoronix

What is the best way to disable it temporarily to check? I need it in the end for work but would like to know if this is the issue.

Run lsmod | grep vbox, and then unload them with sudo rmmod <name>. If they are not loaded, then they shouldn’t have any effect. Can you also try unloading the atlantic kernel module to see if that makes any difference?

Thank you, I’ll definitely try that in the evening.

A few questions to your suggestions however

  1. Is the atlantic the module for networking? To see if the network stack is the reason?
  2. Will rmmod persist through rebooting? Or should I do this once after boot before suspending?
  3. If persisting, should I simply run sudo insmod <name> again afterwards?

Yes, atlantic is “Marvell (Aquantia) Corporation® Network Driver” (modinfo atlantic).

It will not persist. You should do it before suspending.

1 Like

Disabling vbox modules did not help.

But disabling the atlantic module indeed did fix it. I could suspend and resume multiple times without issues.

Question is, what now?

UPDATE: here again the Kernel error in question. Unfortunately it is tainted, so I can’t make a bug report yet I guess.

When does this bug happen? Right before suspending? After wakeup?

After wakeup. a few seconds in.

Do not upgrade/change the kernel, stay on 5.9.10 for now.
Do the following:

# create a new directory, and enter it
mkdir ~/temp
cd ~/temp
# download kernel 5.9.10 source
wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.9.10.tar.xz
# extract it
tar -xf linux-5.9.10.tar.xz
# create a new directory, enter it
mkdir atlantic
cd atlantic
# copy the source code of the module
cp -r ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/* .
# now the "Makefile" needs to be modified
sed -i 's/-I$(srctree)\/$(src)/-I$(PWD)/' Makefile
# the following is a single command ˇˇˇ
cat >> Makefile <<EOF
all:
\tmake -C /lib/modules/\$(shell uname -r)/build M=\$(PWD) modules

clean:
\tmake -C /lib/modules/\$(shell uname -r)/build M=\$(PWD) clean
EOF
# ^^^ ends here; paste the whole thing into your terminal
sed -i 's/\\t/\t/' Makefile
# now build the module
make CFLAGS_MODULE="-ggdb3 -Og" -j
# unload the original module
sudo modprobe -r atlantic
# load a dependency of the newly built module
sudo modprobe macsec
# load the just compiled one
sudo insmod atlantic.ko

If all of the above succeeds, try suspending, then resume, and then post the kernel error you get after wake-up.

Thank you for all your effort! :star_struck:

I did what you instructed. This time the crash happened only after the second suspend.

Nov 23 23:57:05 **** kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
Nov 23 23:57:05 **** kernel: #PF: supervisor write access in kernel mode
Nov 23 23:57:05 **** kernel: #PF: error_code(0x0002) - not-present page
Nov 23 23:57:05 **** kernel: PGD 0 P4D 0 
Nov 23 23:57:05 **** kernel: audit: type=1101 audit(1606172225.633:328): pid=12586 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_time acct="root" exe="/usr/bin/crond" hostname=? addr=? terminal=cron res=success'
Nov 23 23:57:05 **** kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 23 23:57:05 **** kernel: CPU: 2 PID: 1598 Comm: NetworkManager Tainted: P           OE     5.9.10-1-MANJARO #1
Nov 23 23:57:05 **** kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Professional Gaming, BIOS P3.30 01/15/2018
Nov 23 23:57:05 **** kernel: RIP: 0010:aq_ring_rx_fill+0x66/0xb2 [atlantic]
Nov 23 23:57:05 **** kernel: Code: 00 00 00 00 eb 0d 29 d0 83 e8 01 eb dc 89 45 24 44 89 e0 44 8d 60 ff 85 c0 74 52 8b 45 24 48 8d 1c 40 48 c1 e3 04 48 03 5d 00 <48> c7 43 28 00 00 00 00 66 c7 43 28 00 08 44 89 ea 48 89 de 48 89
Nov 23 23:57:05 **** kernel: audit: type=1103 audit(1606172225.633:329): pid=12586 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_unix,pam_env acct="root" exe="/usr/bin/crond" hostname=? addr=? terminal=cron res=success'
Nov 23 23:57:05 **** kernel: RSP: 0018:ffffad32a5ea73b0 EFLAGS: 00010246
Nov 23 23:57:05 **** kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000020
Nov 23 23:57:05 **** kernel: RDX: 0000000000000000 RSI: ffffad32a4946100 RDI: ffff997923d8f3b8
Nov 23 23:57:05 **** kernel: RBP: ffff997923d8f3b8 R08: 0000000000000000 R09: ffff99794b4d0720
Nov 23 23:57:05 **** kernel: R10: ffff997b52547088 R11: ffff997b57164070 R12: 00000000fffffffe
Nov 23 23:57:05 **** kernel: audit: type=1006 audit(1606172225.633:330): pid=12586 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=14 res=1
Nov 23 23:57:05 **** kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
Nov 23 23:57:05 **** kernel: FS:  00007f67819b38c0(0000) GS:ffff997b5ee80000(0000) knlGS:0000000000000000
Nov 23 23:57:05 **** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 23:57:05 **** kernel: CR2: 0000000000000028 CR3: 0000000fa4a2a000 CR4: 00000000003506e0
Nov 23 23:57:05 **** kernel: audit: type=1105 audit(1606172225.637:331): pid=12586 uid=0 auid=0 ses=14 msg='op=PAM:session_open grantors=pam_loginuid,pam_limits,pam_unix acct="root" exe="/usr/bin/crond" hostname=? addr=? terminal=cron res=success'
Nov 23 23:57:05 **** kernel: Call Trace:
Nov 23 23:57:05 **** kernel:  aq_vec_init+0x9e/0xe1 [atlantic]
Nov 23 23:57:05 **** kernel:  aq_nic_init+0xf1/0x191 [atlantic]
Nov 23 23:57:05 **** kernel: audit: type=1110 audit(1606172225.637:332): pid=12586 uid=0 auid=0 ses=14 msg='op=PAM:setcred grantors=pam_unix,pam_env acct="root" exe="/usr/bin/crond" hostname=? addr=? terminal=cron res=success'
Nov 23 23:57:05 **** kernel:  aq_ndev_open+0x16/0x5a [atlantic]
Nov 23 23:57:05 **** kernel:  __dev_open+0xfb/0x1b0
Nov 23 23:57:05 **** kernel:  __dev_change_flags+0x1a5/0x210
Nov 23 23:57:05 **** audit[12586]: USER_START pid=12586 uid=0 auid=0 ses=14 msg='op=PAM:session_open grantors=pam_loginuid,pam_limits,pam_unix acct="root" exe="/usr/bin/crond" hostname=? addr=? terminal=cron res=success'
Nov 23 23:57:05 **** audit[12586]: CRED_REFR pid=12586 uid=0 auid=0 ses=14 msg='op=PAM:setcred grantors=pam_unix,pam_env acct="root" exe="/usr/bin/crond" hostname=? addr=? terminal=cron res=success'
Nov 23 23:57:05 **** kernel:  dev_change_flags+0x21/0x60
Nov 23 23:57:05 **** kernel:  do_setlink+0x2bc/0x1160
Nov 23 23:57:05 **** kernel:  ? __nla_validate_parse+0x5f/0x910
Nov 23 23:57:05 **** kernel:  __rtnl_newlink+0x65f/0x9e0
Nov 23 23:57:05 **** kernel:  rtnl_newlink+0x44/0x70
Nov 23 23:57:05 **** kernel:  rtnetlink_rcv_msg+0x13e/0x390
Nov 23 23:57:05 **** kernel:  ? rtnl_calcit.isra.0+0x120/0x120
Nov 23 23:57:05 **** kernel:  netlink_rcv_skb+0x75/0x140
Nov 23 23:57:05 **** kernel:  netlink_unicast+0x242/0x340
Nov 23 23:57:05 **** kernel:  netlink_sendmsg+0x243/0x480
Nov 23 23:57:05 **** kernel:  sock_sendmsg+0x5e/0x60
Nov 23 23:57:05 **** kernel:  ____sys_sendmsg+0x25a/0x2a0
Nov 23 23:57:05 **** kernel:  ? copy_msghdr_from_user+0x6e/0xa0
Nov 23 23:57:05 **** kernel:  ___sys_sendmsg+0x97/0xe0
Nov 23 23:57:05 **** kernel:  __sys_sendmsg+0x81/0xd0
Nov 23 23:57:05 **** kernel:  do_syscall_64+0x33/0x40
Nov 23 23:57:05 **** kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 23 23:57:05 **** kernel: RIP: 0033:0x7f67826bfddd
Nov 23 23:57:05 **** kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 4a ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 9e ee ff ff 48
Nov 23 23:57:05 **** kernel: RSP: 002b:00007ffc47500d20 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Nov 23 23:57:05 **** kernel: RAX: ffffffffffffffda RBX: 0000563cc28d6050 RCX: 00007f67826bfddd
Nov 23 23:57:05 **** kernel: RDX: 0000000000000000 RSI: 00007ffc47500d60 RDI: 000000000000000c
Nov 23 23:57:05 **** kernel: RBP: 000000000000017b R08: 0000000000000000 R09: 0000000000000000
Nov 23 23:57:05 **** kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Nov 23 23:57:05 **** kernel: R13: 00007ffc47500eb0 R14: 00007ffc47500eac R15: 0000000000000000
Nov 23 23:57:05 **** kernel: Modules linked in: atlantic(OE) macsec rfcomm snd_seq_dummy snd_hrtimer snd_seq fuse cmac algif_hash algif_skcipher af_alg bnep nct6775 hwmon_vid dm_crypt cbc encrypted_keys trusted tpm btusb btrtl btbcm btintel bluetooth snd_usb_audio s>
Nov 23 23:57:05 **** kernel:  pcspkr rng_core rfkill wmi pinctrl_amd gpio_amdpt evdev mac_hid acpi_cpufreq zcommon(POE) znvpair(POE) spl(OE) uinput vboxnetflt(OE) vboxnetadp(OE) nfsd auth_rpcgss vboxdrv(OE) nfs_acl lockd grace videodev drm sunrpc mc sg crypto_user a>
Nov 23 23:57:05 **** kernel: CR2: 0000000000000028
Nov 23 23:57:05 **** kernel: ---[ end trace 71753c3b496c2743 ]---
Nov 23 23:57:05 **** kernel: RIP: 0010:aq_ring_rx_fill+0x66/0xb2 [atlantic]
Nov 23 23:57:05 **** kernel: Code: 00 00 00 00 eb 0d 29 d0 83 e8 01 eb dc 89 45 24 44 89 e0 44 8d 60 ff 85 c0 74 52 8b 45 24 48 8d 1c 40 48 c1 e3 04 48 03 5d 00 <48> c7 43 28 00 00 00 00 66 c7 43 28 00 08 44 89 ea 48 89 de 48 89
Nov 23 23:57:05 **** kernel: RSP: 0018:ffffad32a5ea73b0 EFLAGS: 00010246
Nov 23 23:57:05 **** kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000020
Nov 23 23:57:05 **** kernel: RDX: 0000000000000000 RSI: ffffad32a4946100 RDI: ffff997923d8f3b8
Nov 23 23:57:05 **** kernel: RBP: ffff997923d8f3b8 R08: 0000000000000000 R09: ffff99794b4d0720
Nov 23 23:57:05 **** kernel: R10: ffff997b52547088 R11: ffff997b57164070 R12: 00000000fffffffe
Nov 23 23:57:05 **** kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
Nov 23 23:57:05 **** kernel: FS:  00007f67819b38c0(0000) GS:ffff997b5ee80000(0000) knlGS:0000000000000000
Nov 23 23:57:05 **** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 23:57:05 **** kernel: CR2: 0000000000000028 CR3: 0000000fa4a2a000 CR4: 00000000003506e0

Thanks, please save the following into the file p1.patch in the atlantic directory:

diff -ruN ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_common.h ./aq_common.h
--- ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_common.h	2020-11-22 10:15:33.000000000 +0100
+++ ./aq_common.h	2020-11-24 11:14:30.684185705 +0100
@@ -10,6 +10,14 @@
 #ifndef AQ_COMMON_H
 #define AQ_COMMON_H
 
+#define W(x) #x
+#define V(x) W(x)
+#define S(x) V(x)
+#define WARN_IF(cond) WARN((cond), __FILE__ ":" S(__LINE__) " : `" __stringify(cond) "` triggered warning\n");
+#define pr_fmt(fmt) KBUILD_MODNAME ": " __FILE__ ":" S(__LINE__) " : " fmt
+
+#include <linux/printk.h>
+#include <linux/bug.h>
 #include <linux/etherdevice.h>
 #include <linux/pci.h>
 #include <linux/if_vlan.h>
diff -ruN ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_nic.c ./aq_nic.c
--- ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_nic.c	2020-11-22 10:15:33.000000000 +0100
+++ ./aq_nic.c	2020-11-24 13:35:34.071074242 +0100
@@ -433,6 +433,7 @@
 		if (err)
 			goto err_exit;
 
+		pr_info("i = %u\n", i);
 		aq_vec_init(aq_vec, self->aq_hw_ops, self->aq_hw);
 	}
 
diff -ruN ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_ring.c ./aq_ring.c
--- ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_ring.c	2020-11-22 10:15:33.000000000 +0100
+++ ./aq_ring.c	2020-11-24 13:29:50.687634131 +0100
@@ -547,14 +547,20 @@
 	int err = 0;
 	int i = 0;
 
+	WARN_IF(!self);
+	WARN_IF(!self->buff_ring);
+
 	if (aq_ring_avail_dx(self) < min_t(unsigned int, AQ_CFG_RX_REFILL_THRES,
 					   self->size / 2))
 		return err;
 
 	for (i = aq_ring_avail_dx(self); i--;
 		self->sw_tail = aq_ring_next_dx(self, self->sw_tail)) {
+		pr_info("i = %d, self->sw_tail = %u\n", i, self->sw_tail);
 		buff = &self->buff_ring[self->sw_tail];
 
+		WARN_IF(!buff);
+
 		buff->flags = 0U;
 		buff->len = AQ_CFG_RX_FRAME_MAX;
 
diff -ruN ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_vec.c ./aq_vec.c
--- ../linux-5.9.10/drivers/net/ethernet/aquantia/atlantic/./aq_vec.c	2020-11-22 10:15:33.000000000 +0100
+++ ./aq_vec.c	2020-11-24 13:35:59.717500452 +0100
@@ -179,11 +179,21 @@
 	unsigned int i = 0U;
 	int err = 0;
 
+	WARN_IF(!self);
+	WARN_IF(!aq_hw_ops);
+	WARN_IF(!aq_hw);
+
 	self->aq_hw_ops = aq_hw_ops;
 	self->aq_hw = aq_hw;
 
+	WARN_IF(!self->ring);
+
+	pr_info("self->tx_rings = %u\n", self->tx_rings);
+
 	for (i = 0U, ring = self->ring[0];
 		self->tx_rings > i; ++i, ring = self->ring[i]) {
+
+		WARN_IF(!ring);
 		err = aq_ring_init(&ring[AQ_VEC_TX_ID], ATL_RING_TX);
 		if (err < 0)
 			goto err_exit;
@@ -204,6 +214,7 @@
 		if (err < 0)
 			goto err_exit;
 
+		pr_info("i = %u\n", i);
 		err = aq_ring_rx_fill(&ring[AQ_VEC_RX_ID]);
 		if (err < 0)
 			goto err_exit;

then

# go into directory
cd ~/temp/atlantic
# patch the code
patch -p0 < p1.patch
# recompile
make CFLAGS_MODULE="-ggdb3 -Og" -j
# remove the old one
sudo modprobe -r atlantic
# insert the new one
sudo modprobe macsec
sudo insmod atlantic.ko

and then try suspend-resume again, and watch out for warnings/errors in the kernel log.

Thanks, I’ll try this after work in a few hours. BTW, I also filed a bug report for it on their project site, once I knew the module that caused the problem. Here: Kernel panic after resume from suspend · Issue #22 · Aquantia/AQtion · GitHub

!self->buff_ring triggered warning

or as text:

Nov 24 21:16:50 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:559 : i = 0, self->sw_tail = 2046
Nov 24 21:16:50 **** kernel: atlantic: /root/temp/atlantic/aq_vec.c:217 : i = 1
Nov 24 21:16:50 **** kernel: ------------[ cut here ]------------
Nov 24 21:16:50 **** kernel: /root/temp/atlantic/aq_ring.c:551 : `!self->buff_ring` triggered warning
Nov 24 21:16:50 **** kernel: WARNING: CPU: 4 PID: 1596 at /root/temp/atlantic/aq_ring.c:551 aq_ring_rx_fill+0x7b/0x106 [atlantic]
Nov 24 21:16:50 **** kernel: Modules linked in: atlantic(OE) macsec snd_seq_dummy snd_hrtimer snd_seq rfcomm fuse cmac algif_hash algif_skcipher af_alg bnep nct6775 hwmon_vid dm_crypt cbc encrypted_keys trusted tpm squashfs btusb btrtl nls_iso8859_1 btbcm nls_cp437 btintel vfat bluetooth fat loop ecdh_generic ecc iwlmvm mac80211 libarc4 snd_usb_audio iwlwifi snd_usbmidi_lib snd_rawmidi hid_plantronics snd_seq_device input_leds joydev mousedev cfg80211 igb>
Nov 24 21:16:50 **** kernel:  evdev pinctrl_amd gpio_amdpt mac_hid acpi_cpufreq hid_steam zcommon(POE) znvpair(POE) spl(OE) uinput vboxnetflt(OE) vboxnetadp(OE) nfsd auth_rpcgss vboxdrv(OE) nfs_acl lockd grace videodev drm sunrpc mc sg crypto_user agpgart nfs_ssc ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage crc32c_intel xhci_pci sr_mod xhci_hcd cdrom [last unloaded: macsec]
Nov 24 21:16:50 **** kernel: CPU: 4 PID: 1596 Comm: NetworkManager Tainted: P           OE     5.9.10-1-MANJARO #1
Nov 24 21:16:50 **** kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Professional Gaming, BIOS P3.30 01/15/2018
Nov 24 21:16:50 **** kernel: RIP: 0010:aq_ring_rx_fill+0x7b/0x106 [atlantic]
Nov 24 21:16:50 **** kernel: Code: ff 85 c0 75 30 89 d0 5b 5d 41 5c 41 5d c3 48 c7 c7 38 f3 83 c1 e8 50 56 fc f4 0f 0b eb a7 48 c7 c7 78 f3 83 c1 e8 40 56 fc f4 <0f> 0b eb 9d 29 d0 83 e8 01 eb a9 8b 53 24 44 89 e6 48 c7 c7 c8 f3
Nov 24 21:16:50 **** kernel: RSP: 0018:ffffb1f2657df3b0 EFLAGS: 00010282
Nov 24 21:16:50 **** kernel: RAX: 0000000000000000 RBX: ffff90c95c81a3b8 RCX: 0000000000000000
Nov 24 21:16:50 **** kernel: RDX: 0000000000000001 RSI: ffffffffb71894c2 RDI: 00000000ffffffff
Nov 24 21:16:50 **** kernel: RBP: 0000000000000000 R08: 000000000000d68e R09: 0000000000000004
Nov 24 21:16:50 **** kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
Nov 24 21:16:50 **** kernel: R13: ffff90c95c81a020 R14: 0000000000000001 R15: 0000000000000000
Nov 24 21:16:50 **** kernel: FS:  00007f1191fae8c0(0000) GS:ffff90cb9ef00000(0000) knlGS:0000000000000000
Nov 24 21:16:50 **** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 21:16:50 **** kernel: CR2: 00007f789ee67000 CR3: 0000000fa6ad6000 CR4: 00000000003506e0
Nov 24 21:16:50 **** kernel: Call Trace:
Nov 24 21:16:50 **** kernel:  aq_vec_init+0x132/0x175 [atlantic]
Nov 24 21:16:50 **** kernel:  aq_nic_init+0x13e/0x1a5 [atlantic]
Nov 24 21:16:50 **** kernel:  aq_ndev_open+0x16/0x5a [atlantic]
Nov 24 21:16:50 **** kernel:  __dev_open+0xfb/0x1b0
Nov 24 21:16:50 **** kernel:  __dev_change_flags+0x1a5/0x210
Nov 24 21:16:50 **** kernel:  dev_change_flags+0x21/0x60
Nov 24 21:16:50 **** kernel:  do_setlink+0x2bc/0x1160
Nov 24 21:16:50 **** kernel:  ? __nla_validate_parse+0x5f/0x910
Nov 24 21:16:50 **** kernel:  __rtnl_newlink+0x65f/0x9e0
Nov 24 21:16:50 **** kernel:  rtnl_newlink+0x44/0x70
Nov 24 21:16:50 **** kernel:  rtnetlink_rcv_msg+0x13e/0x390
Nov 24 21:16:50 **** kernel:  ? rtnl_calcit.isra.0+0x120/0x120
Nov 24 21:16:50 **** kernel:  netlink_rcv_skb+0x75/0x140
Nov 24 21:16:50 **** kernel:  netlink_unicast+0x242/0x340
Nov 24 21:16:50 **** kernel:  netlink_sendmsg+0x243/0x480
Nov 24 21:16:50 **** kernel:  sock_sendmsg+0x5e/0x60
Nov 24 21:16:50 **** kernel:  ____sys_sendmsg+0x25a/0x2a0
Nov 24 21:16:50 **** kernel:  ? copy_msghdr_from_user+0x6e/0xa0
Nov 24 21:16:50 **** kernel:  ___sys_sendmsg+0x97/0xe0
Nov 24 21:16:50 **** kernel:  __sys_sendmsg+0x81/0xd0
Nov 24 21:16:50 **** kernel:  do_syscall_64+0x33/0x40
Nov 24 21:16:50 **** kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 24 21:16:50 **** kernel: RIP: 0033:0x7f1192cbaddd
Nov 24 21:16:50 **** kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 4a ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 9e ee ff ff 48
Nov 24 21:16:50 **** kernel: RSP: 002b:00007fff9abfaed0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Nov 24 21:16:50 **** kernel: RAX: ffffffffffffffda RBX: 0000562cdad72030 RCX: 00007f1192cbaddd
Nov 24 21:16:50 **** kernel: RDX: 0000000000000000 RSI: 00007fff9abfaf10 RDI: 000000000000000c
Nov 24 21:16:50 **** kernel: RBP: 0000000000000131 R08: 0000000000000000 R09: 0000000000000000
Nov 24 21:16:50 **** kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Nov 24 21:16:50 **** kernel: R13: 00007fff9abfb060 R14: 00007fff9abfb05c R15: 0000000000000000
Nov 24 21:16:50 **** kernel: ---[ end trace c00e3fdc05f15a55 ]---
Nov 24 21:16:50 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:559 : i = -2, self->sw_tail = 0
Nov 24 21:16:50 **** kernel: ------------[ cut here ]------------
Nov 24 21:16:50 **** kernel: /root/temp/atlantic/aq_ring.c:562 : `!buff` triggered warning
Nov 24 21:16:50 **** kernel: WARNING: CPU: 4 PID: 1596 at /root/temp/atlantic/aq_ring.c:562 aq_ring_rx_fill+0xb1/0x106 [atlantic]
Nov 24 21:16:50 **** kernel: Modules linked in: atlantic(OE) macsec snd_seq_dummy snd_hrtimer snd_seq rfcomm fuse cmac algif_hash algif_skcipher af_alg bnep nct6775 hwmon_vid dm_crypt cbc encrypted_keys trusted tpm squashfs btusb btrtl nls_iso8859_1 btbcm nls_cp437 btintel vfat bluetooth fat loop ecdh_generic ecc iwlmvm mac80211 libarc4 snd_usb_audio iwlwifi snd_usbmidi_lib snd_rawmidi hid_plantronics snd_seq_device input_leds joydev mousedev cfg80211 igb>
Nov 24 21:16:50 **** kernel:  evdev pinctrl_amd gpio_amdpt mac_hid acpi_cpufreq hid_steam zcommon(POE) znvpair(POE) spl(OE) uinput vboxnetflt(OE) vboxnetadp(OE) nfsd auth_rpcgss vboxdrv(OE) nfs_acl lockd grace videodev drm sunrpc mc sg crypto_user agpgart nfs_ssc ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage crc32c_intel xhci_pci sr_mod xhci_hcd cdrom [last unloaded: macsec]
Nov 24 21:16:50 **** kernel: CPU: 4 PID: 1596 Comm: NetworkManager Tainted: P        W  OE     5.9.10-1-MANJARO #1
Nov 24 21:16:50 **** kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Professional Gaming, BIOS P3.30 01/15/2018
Nov 24 21:16:50 **** kernel: RIP: 0010:aq_ring_rx_fill+0xb1/0x106 [atlantic]
Nov 24 21:16:50 **** kernel: Code: 53 24 44 89 e6 48 c7 c7 c8 f3 83 c1 e8 db b0 fc f4 44 8b 6b 24 4d 6b ed 30 4c 03 2b 75 0e 48 c7 c7 18 f4 83 c1 e8 0a 56 fc f4 <0f> 0b 49 c7 45 28 00 00 00 00 66 41 c7 45 28 00 08 89 ea 4c 89 ee
Nov 24 21:16:50 **** kernel: RSP: 0018:ffffb1f2657df3b0 EFLAGS: 00010282
Nov 24 21:16:50 **** kernel: RAX: 0000000000000000 RBX: ffff90c95c81a3b8 RCX: 0000000000000000
Nov 24 21:16:50 **** kernel: RDX: 0000000000000001 RSI: ffffffffb71894c2 RDI: 00000000ffffffff
Nov 24 21:16:50 **** kernel: RBP: 0000000000000000 R08: 000000000000d6c1 R09: 0000000000000004
Nov 24 21:16:50 **** kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 00000000fffffffe
Nov 24 21:16:50 **** kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
Nov 24 21:16:50 **** kernel: FS:  00007f1191fae8c0(0000) GS:ffff90cb9ef00000(0000) knlGS:0000000000000000
Nov 24 21:16:50 **** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 21:16:50 **** kernel: CR2: 00007f789ee67000 CR3: 0000000fa6ad6000 CR4: 00000000003506e0
Nov 24 21:16:50 **** kernel: Call Trace:
Nov 24 21:16:50 **** kernel:  aq_vec_init+0x132/0x175 [atlantic]
Nov 24 21:16:50 **** kernel:  aq_nic_init+0x13e/0x1a5 [atlantic]
Nov 24 21:16:50 **** kernel:  aq_ndev_open+0x16/0x5a [atlantic]
Nov 24 21:16:50 **** kernel:  __dev_open+0xfb/0x1b0
Nov 24 21:16:50 **** kernel:  __dev_change_flags+0x1a5/0x210
Nov 24 21:16:50 **** kernel:  dev_change_flags+0x21/0x60
Nov 24 21:16:50 **** kernel:  do_setlink+0x2bc/0x1160
Nov 24 21:16:50 **** kernel:  ? __nla_validate_parse+0x5f/0x910
Nov 24 21:16:50 **** kernel:  __rtnl_newlink+0x65f/0x9e0
Nov 24 21:16:50 **** kernel:  rtnl_newlink+0x44/0x70
Nov 24 21:16:50 **** kernel:  rtnetlink_rcv_msg+0x13e/0x390
Nov 24 21:16:50 **** kernel:  ? rtnl_calcit.isra.0+0x120/0x120
Nov 24 21:16:50 **** kernel:  netlink_rcv_skb+0x75/0x140
Nov 24 21:16:50 **** kernel:  netlink_unicast+0x242/0x340
Nov 24 21:16:50 **** kernel:  netlink_sendmsg+0x243/0x480
Nov 24 21:16:50 **** kernel:  sock_sendmsg+0x5e/0x60
Nov 24 21:16:50 **** kernel:  ____sys_sendmsg+0x25a/0x2a0
Nov 24 21:16:50 **** kernel:  ? copy_msghdr_from_user+0x6e/0xa0
Nov 24 21:16:50 **** kernel:  ___sys_sendmsg+0x97/0xe0
Nov 24 21:16:50 **** kernel:  __sys_sendmsg+0x81/0xd0
Nov 24 21:16:50 **** kernel:  do_syscall_64+0x33/0x40
Nov 24 21:16:50 **** kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 24 21:16:50 **** kernel: RIP: 0033:0x7f1192cbaddd
Nov 24 21:16:50 **** kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 4a ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 9e ee ff ff 48
Nov 24 21:16:50 **** kernel: RSP: 002b:00007fff9abfaed0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Nov 24 21:16:50 **** kernel: RAX: ffffffffffffffda RBX: 0000562cdad72030 RCX: 00007f1192cbaddd
Nov 24 21:16:50 **** kernel: RDX: 0000000000000000 RSI: 00007fff9abfaf10 RDI: 000000000000000c
Nov 24 21:16:50 **** kernel: RBP: 0000000000000131 R08: 0000000000000000 R09: 0000000000000000
Nov 24 21:16:50 **** kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Nov 24 21:16:50 **** kernel: R13: 00007fff9abfb060 R14: 00007fff9abfb05c R15: 0000000000000000
Nov 24 21:16:50 **** kernel: ---[ end trace c00e3fdc05f15a56 ]---
Nov 24 21:16:50 **** kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
Nov 24 21:16:50 **** kernel: #PF: supervisor write access in kernel mode
Nov 24 21:16:50 **** kernel: #PF: error_code(0x0002) - not-present page
Nov 24 21:16:50 **** kernel: PGD 0 P4D 0 
Nov 24 21:16:50 **** kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 24 21:16:50 **** kernel: CPU: 4 PID: 1596 Comm: NetworkManager Tainted: P        W  OE     5.9.10-1-MANJARO #1

Thanks, could you edit your comment and add all .... kernel: atlantic: /root/temp/atlantic/... : ...-like entries from the kernel log with time?

It’s too big. Here the paste bin: Untitled - Pastebin

To me it seems like the suspend causes a aq_nic_deinit which ultimately will end up calling aq_ring_free which will execute kfree(self->buff_ring); and possible cause this.

In case it helps I added a bit more logging:

Before sleep. 16 kfree’s of buff_ring

Nov 24 22:19:42 **** kernel: audit: type=1130 audit(1606252782.972:597): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 132
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 160
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 460
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 63
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 77
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 2047
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 62
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 415
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 39
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 2047
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 19
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 2047
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 37
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 31
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 70
Nov 24 22:19:43 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:597 : Freeing buff_ring, self->sw_tail = 31
Nov 24 22:19:43 **** kernel: audit: type=1334 audit(1606252783.012:598): prog-id=24 op=UNLOAD
Nov 24 22:19:43 **** kernel: ksystemstats[26059]: segfault at 55beb4009e36 ip 00007f39972c900b sp 00007ffef05060e0 error 4 in libksgrdbackend.so[7f39972c8000+5000]
Nov 24 22:19:43 **** kernel: Code: d0 eb 0d 0f 1f 40 00 48 83 c0 08 48 39 c8 74 0b 48 8b 10 48 39 d5 74 ef 48 89 d5 48 8b 75 18 4c 8d 7c 24 10 4c 89 ff 48 8b 06 <ff> 50 70 48 89 ef ff 15 71 7f 00 00 48 89 c5 49 39 86 88 00 00 00
Nov 24 22:19:43 **** kernel: audit: type=1701 audit(1606252783.116:599): auid=1001 uid=1001 gid=1001 ses=3 pid=26059 comm="ksystemstats" exe="/usr/bin/ksystemstats" sig=11 res=1
Nov 24 22:19:43 **** kernel: audit: type=1334 audit(1606252783.129:600): prog-id=25 op=LOAD
Nov 24 22:19:43 **** kernel: audit: type=1334 audit(1606252783.129:601): prog-id=26 op=LOAD
Nov 24 22:19:43 **** kernel: audit: type=1130 audit(1606252783.132:602): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-28302-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 24 22:19:43 **** kernel: audit: type=1131 audit(1606252783.486:603): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-28302-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 24 22:19:43 **** kernel: audit: type=1334 audit(1606252783.563:604): prog-id=26 op=UNLOAD
Nov 24 22:19:43 **** kernel: audit: type=1334 audit(1606252783.563:605): prog-id=25 op=UNLOAD
Nov 24 22:19:43 **** kernel: PM: suspend entry (deep)

After wakeup: Two allocs of buff_ring

Nov 24 22:20:04 **** kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Nov 24 22:20:04 **** kernel: ata5.00: configured for UDMA/133
Nov 24 22:20:04 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:112 : alloc buff_ring
Nov 24 22:20:04 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:112 : alloc buff_ring
Nov 24 22:20:04 **** kernel: atlantic: /root/temp/atlantic/aq_nic.c:436 : i = 0
Nov 24 22:20:04 **** kernel: atlantic: /root/temp/atlantic/aq_vec.c:191 : self->tx_rings = 2
Nov 24 22:20:04 **** kernel: atlantic: /root/temp/atlantic/aq_vec.c:217 : i = 0
Nov 24 22:20:04 **** kernel: atlantic: /root/temp/atlantic/aq_ring.c:559 : i = 2046, self->sw_tail = 0

Full log of that test: Untitled - Pastebin

Btw, using your way the crash seems to only always happen on 2nd suspend. Not on first suspend.

Thanks, can you save this as p2.patch:

diff --git a/aq_nic.c b/aq_nic.c
index e9cbaf4..5782687 100644
--- a/aq_nic.c
+++ b/aq_nic.c
@@ -426,8 +426,11 @@ int aq_nic_init(struct aq_nic_s *self)
 				aq_phy_disable_ptp(self->aq_hw);
 	}
 
+	pr_info("self->aq_vecs = %d\n", (int) self->aq_vecs);
+
 	for (i = 0U; i < self->aq_vecs; i++) {
 		aq_vec = self->aq_vec[i];
+		pr_info("aq_vec_ring_alloc(i=%u)\n", i);
 		err = aq_vec_ring_alloc(aq_vec, self, i,
 					aq_nic_get_cfg(self));
 		if (err)
diff --git a/aq_ring.c b/aq_ring.c
index 6a226b4..3f23583 100644
--- a/aq_ring.c
+++ b/aq_ring.c
@@ -556,7 +556,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self)
 
 	for (i = aq_ring_avail_dx(self); i--;
 		self->sw_tail = aq_ring_next_dx(self, self->sw_tail)) {
-		pr_info("i = %d, self->sw_tail = %u\n", i, self->sw_tail);
+		pr_debug("i = %d, self->sw_tail = %u\n", i, self->sw_tail);
 		buff = &self->buff_ring[self->sw_tail];
 
 		WARN_IF(!buff);
diff --git a/aq_vec.c b/aq_vec.c
index d94d5ea..64557a3 100644
--- a/aq_vec.c
+++ b/aq_vec.c
@@ -138,6 +138,10 @@ int aq_vec_ring_alloc(struct aq_vec_s *self, struct aq_nic_s *aq_nic,
 	unsigned int i = 0U;
 	int err = 0;
 
+	pr_info("self->tx_rings = %d\n", (int) self->tx_rings);
+	pr_info("self->rx_rings = %d\n", (int) self->rx_rings);
+	pr_info("aq_nic_cfg->tcs = %d\n", (int) aq_nic_cfg->tcs);
+
 	for (i = 0; i < aq_nic_cfg->tcs; ++i) {
 		const unsigned int idx_ring = AQ_NIC_CFG_TCVEC2RING(aq_nic_cfg,
 								    i, idx);
@@ -150,6 +154,7 @@ int aq_vec_ring_alloc(struct aq_vec_s *self, struct aq_nic_s *aq_nic,
 		}
 
 		++self->tx_rings;
+		pr_info("self->tx_rings -> %d\n", (int) self->tx_rings);
 
 		aq_nic_set_tx_ring(aq_nic, idx_ring, ring);
 
@@ -161,6 +166,7 @@ int aq_vec_ring_alloc(struct aq_vec_s *self, struct aq_nic_s *aq_nic,
 		}
 
 		++self->rx_rings;
+		pr_info("self->tx_rings -> %d\n", (int) self->rx_rings);
 	}
 
 err_exit:
@@ -189,11 +195,16 @@ int aq_vec_init(struct aq_vec_s *self, const struct aq_hw_ops *aq_hw_ops,
 	WARN_IF(!self->ring);
 
 	pr_info("self->tx_rings = %u\n", self->tx_rings);
+	pr_info("self->rx_rings = %u\n", self->rx_rings);
+
+	WARN_IF(self->tx_rings != self->rx_rings);
 
 	for (i = 0U, ring = self->ring[0];
 		self->tx_rings > i; ++i, ring = self->ring[i]) {
 
 		WARN_IF(!ring);
+
+		pr_info("aq_ring_init(self->ring[i=%u][TX])\n", i);
 		err = aq_ring_init(&ring[AQ_VEC_TX_ID], ATL_RING_TX);
 		if (err < 0)
 			goto err_exit;
@@ -204,6 +215,8 @@ int aq_vec_init(struct aq_vec_s *self, const struct aq_hw_ops *aq_hw_ops,
 		if (err < 0)
 			goto err_exit;
 
+
+		pr_info("aq_ring_init(self->ring[i=%u][RX])\n", i);
 		err = aq_ring_init(&ring[AQ_VEC_RX_ID], ATL_RING_RX);
 		if (err < 0)
 			goto err_exit;
@@ -308,11 +321,18 @@ void aq_vec_ring_free(struct aq_vec_s *self)
 	if (!self)
 		goto err_exit;
 
+	pr_info("self->tx_rings = %u\n", self->tx_rings);
+	pr_info("self->rx_rings = %u\n", self->rx_rings);
+
 	for (i = 0U, ring = self->ring[0];
 		self->tx_rings > i; ++i, ring = self->ring[i]) {
+
+		pr_info("aq_ring_free(self->ring[i=%u][TX])\n", i);
 		aq_ring_free(&ring[AQ_VEC_TX_ID]);
-		if (i < self->rx_rings)
+		if (i < self->rx_rings) {
+			pr_info("aq_ring_free(self->ring[i=%u][RX])\n", i);
 			aq_ring_free(&ring[AQ_VEC_RX_ID]);
+		}
 	}
 
 	self->tx_rings = 0;

and apply it using patch -p1 < p2.patch and do another test?

Hi, here the logs: https://pastebin.pl/view/beb3899f