Kernel list corruption after OOM

I was running a memory-intensive computation, which was killed by the OOM killer. A few minutes later the following message showed up in the kernel logs:

list_del corruption. next->prev should be ffff8e91174a1000, but was 3fea82ccce8eaddd. (next=ffff8e90dbed4000)
WARNING: CPU: 0 PID: 5940 at lib/list_debug.c:54 __list_del_entry_valid+0xbc/0xd0
Modules linked in: udp_diag tcp_diag inet_diag ccm rfcomm qrtr squashfs cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm uvcvideo btintel videobuf2_vmalloc btmtk videobuf2_memops videobuf2_v4l2 bluetooth videobuf2_common videodev mc ecdh_generic loop snd_ctl_led snd_hda_codec_realtek intel_rapl_msr joydev mousedev intel_rapl_common psmouse snd_hda_codec_generic uinput serio_raw ledtrig_audio snd_hda_codec_hdmi iwlmvm intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg atkbd nvidia_uvm(POE) iTCO_wdt intel_pmc_bxt intel_powerclamp iTCO_vendor_support hid_multitouch ee1004 libps2 mac80211 snd_intel_sdw_acpi coretemp mei_hdcp snd_hda_codec vfat libarc4 fat iwlwifi kvm_intel asus_nb_wmi snd_hda_core nvidia_drm(POE) kvm snd_hwdep i2c_i801 i8042 nvidia_modeset(POE) irqbypass rapl intel_cstate intel_uncore snd_pcm cfg80211 r8169 pcspkr serio mxm_wmi snd_timer i2c_smbus intel_lpss_pci snd realtek mei_me intel_lpss i2c_hid_acpi mei soundcore nvidia(POE)
 idma64 intel_pch_thermal i2c_hid tpm_crb mac_hid tpm_tis asus_wireless tpm_tis_core acpi_pad acpi_call(OE) ipmi_devintf ipmi_msghandler sg fuse crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_asus asus_wmi sparse_keymap platform_profile rfkill usbhid dm_crypt cbc encrypted_keys dm_mod trusted asn1_encoder tee tpm rng_core rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd xhci_pci rtsx_pci xhci_pci_renesas wmi video
CPU: 0 PID: 5940 Comm: kworker/u16:0 Tainted: P           OE     5.17.1-3-MANJARO #1 c9242a592d494bee9c6cffbc288adf97c7b0452a
Hardware name: ASUSTeK COMPUTER INC. GL503VM/GL503VM, BIOS GL503VM.316 07/16/2020
Workqueue: zswap1 compact_page_work
RIP: 0010:__list_del_entry_valid+0xbc/0xd0
Code: cb 0e b3 e8 cf dc 67 00 0f 0b 31 c0 31 d2 89 d1 89 d6 89 d7 41 89 d1 c3 48 89 d1 48 c7 c7 98 cb 0e b3 4c 89 ca e8 ad dc 67 00 <0f> 0b 31 c0 31 d2 89 d1 89 d6 89 d7 41 89 d1 c3 cc cc cc cc 48 85
RSP: 0018:ffff9a6c438a3dd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8e9040e8a3c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8e9040e8a3c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e91174a1010
R13: 0000000000000013 R14: 0000000000000013 R15: ffff8e91174a1000
FS:  0000000000000000(0000) GS:ffff8e91a6c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa564878008 CR3: 000000025b410001 CR4: 00000000003706f0
Call Trace:
 ? rescuer_thread+0x3a0/0x3a0
 ? rescuer_thread+0x3a0/0x3a0
 ? kthread_complete_and_exit+0x20/0x20

Additional information:

  • The system was still working after this error. There was no indication of the problem, even though I ran some more computations. I found out about the problem during shutdown, when it started outputting errors.
  • According to the logs, this was the only time this error happened. I was unable to reproduce it, even with the same program.
  • I have uploaded the first 3000 lines of the kernel log here. I can upload the full logs if needed.
System information
  Kernel: 5.17.1-3-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 11.2.0
    parameters: BOOT_IMAGE=/vmlinuz-5.17-x86_64
    root=UUID=dd5f460a-c6c0-4776-b2ca-0d41b4276976 rw quiet
    root=/dev/mapper/luks-3585502a-3e84-419c-a755-361793fd7ed4 apparmor=1
    security=apparmor udev.log_priority=3
  Console: pty pts/1 wm: kwin_x11 DM: SDDM Distro: Manjaro Linux
    base: Arch Linux
  Type: Laptop System: ASUSTeK product: GL503VM v: 1.0 serial: <filter>
  Mobo: ASUSTeK model: GL503VM v: 1.0 serial: <filter>
    UEFI: American Megatrends v: GL503VM.316 date: 07/16/2020
  ID-1: BAT1 charge: 46.9 Wh (100.0%) condition: 46.9/64.4 Wh (72.8%)
    volts: 4.7 min: 15.2 model: ASUS A32-K55 type: Li-ion serial: N/A
    status: full
  RAM: total: 7.72 GiB used: 3.78 GiB (49.0%)
  Array-1: capacity: 64 GiB slots: 4 EC: None max-module-size: 16 GiB
    note: est.
  Device-1: ChannelA-DIMM0 type: DDR4
    detail: synchronous unbuffered (unregistered) size: 8 GiB speed: 2400 MT/s
    volts: curr: 1.2 min: 1.2 max: 1.2 width (bits): data: 64 total: 64
    manufacturer: SK Hynix part-no: HMA81GS6AFR8N-UH serial: <filter>
  Device-2: ChannelA-DIMM1 type: no module installed
  Device-3: ChannelB-DIMM0 type: no module installed
  Device-4: ChannelB-DIMM1 type: no module installed
  Info: model: Intel Core i7-7700HQ socket: BGA1440 (U3E1) note: check
    bits: 64 type: MT MCP arch: Kaby Lake family: 6 model-id: 0x9E (158)
    stepping: 9 microcode: 0xEC
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
    L3: 6 MiB desc: 1x6 MiB
  Speed (MHz): avg: 3604 high: 3690 min/max: 800/3800 base/boost: 2700/2800
    scaling: driver: intel_pstate governor: powersave volts: 0.9 V
    ext-clock: 100 MHz cores: 1: 3602 2: 3519 3: 3619 4: 3690 5: 3640 6: 3600
    7: 3582 8: 3585 bogomips: 44817
  Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat
    arch_capabilities arch_perfmon art avx avx2 bmi1 bmi2 bts clflush clflushopt
    cmov constant_tsc cpuid cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm dts
    epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu fsgsbase fxsr
    ht hwp hwp_act_window hwp_epp hwp_notify ibpb ibrs ida intel_pt invpcid
    invpcid_single lahf_lm lm mca mce md_clear mmx monitor movbe mpx msr mtrr
    nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln
    pni popcnt pse pse36 pti pts rdrand rdseed rdtscp rep_good sdbg sep smap
    smep ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow
    tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave
    xsavec xsaveopt xsaves xtopology xtpr
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf
    mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW,
    STIBP: conditional, RSB filling
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort status: Not affected
  Device-1: NVIDIA GP106M [GeForce GTX 1060 Mobile] vendor: ASUSTeK
    driver: nvidia v: 510.60.02 alternate: nouveau,nvidia_drm pcie: gen: 1
    speed: 2.5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.0
    chip-ID: 10de:1c20 class-ID: 0300
  Device-2: IMC Networks USB2.0 HD UVC WebCam type: USB driver: uvcvideo
    bus-ID: 1-7:4 chip-ID: 13d3:5666 class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: compositor: kwin_x11 driver: X:
    loaded: nvidia gpu: nvidia display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 143 s-size: 341x191mm (13.43x7.52")
    s-diag: 391mm (15.39")
  Monitor-1: DP-0 res: 1920x1080 hz: 120 dpi: 142
    size: 344x193mm (13.54x7.6") diag: 394mm (15.53") modes: N/A
  OpenGL: renderer: NVIDIA GeForce GTX 1060/PCIe/SSE2
    v: 4.6.0 NVIDIA 510.60.02 direct render: Yes
  Device-1: Intel CM238 HD Audio vendor: ASUSTeK driver: snd_hda_intel
    v: kernel bus-ID: 00:1f.3 chip-ID: 8086:a171 class-ID: 0403
  Device-2: NVIDIA GP106 High Definition Audio driver: snd_hda_intel
    v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 01:00.1
    chip-ID: 10de:10f1 class-ID: 0403
  Sound Server-1: ALSA v: k5.17.1-3-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.20 running: no
  Sound Server-3: PulseAudio v: 15.0 running: yes
  Sound Server-4: PipeWire v: 0.3.49 running: yes
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: ASUSTeK driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
    lanes: 1 port: d000 bus-ID: 02:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp2s0 state: down mac: <filter>
  Device-2: Intel Wireless 8265 / 8275 driver: iwlwifi v: kernel pcie:
    gen: 1 speed: 2.5 GT/s lanes: 1 bus-ID: 04:00.0 chip-ID: 8086:24fd
    class-ID: 0280
  IF: wlp4s0 state: up mac: <filter>
  IP v4: <filter> type: noprefixroute scope: global broadcast: <filter>
  IP v6: <filter> type: dynamic noprefixroute scope: global
  IP v6: <filter> type: noprefixroute scope: link
  WAN IP: <filter>
  Device-1: Intel Bluetooth wireless interface type: USB driver: btusb v: 0.8
    bus-ID: 1-6:3 chip-ID: 8087:0a2b class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: down bt-service: enabled,running
    rfk-block: hardware: no software: yes address: see --recommends
  Message: No logical block device data found.
  Device-1: luks-3585502a-3e84-419c-a755-361793fd7ed4 maj-min: 254:0
    type: LUKS dm: dm-0 size: 47.69 GiB
  p-1: sda4 maj-min: 8:4 size: 47.69 GiB
  Device-2: luks-hdd2 maj-min: 254:1 type: LUKS dm: dm-1 size: 100 GiB
  p-1: sdb2 maj-min: 8:18 size: 100 GiB
  Hardware-1: Intel 82801 Mobile SATA Controller [RAID mode] driver: ahci
    v: 3.0 port: f020 bus-ID: 00:17.0 chip-ID: 8086:282a rev: N/A class-ID: 0104
  Local Storage: total: 1.14 TiB used: 49.14 GiB (4.2%)
  ID-1: /dev/sda maj-min: 8:0 vendor: SK Hynix model: HFS256G39TND-N210A
    family: SATA SSDs size: 238.47 GiB block-size: physical: 4096 B
    logical: 512 B sata: 3.1 speed: 6.0 Gb/s type: SSD serial: <filter>
    rev: 0P10 temp: 42 C scheme: GPT
  SMART: yes state: enabled health: PASSED on: 320d 18h cycles: 1565
  ID-2: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST1000LX015-1U7172
    family: FireCuda 2.5 size: 931.51 GiB block-size: physical: 4096 B
    logical: 512 B sata: 3.1 speed: 6.0 Gb/s type: HDD rpm: 5400
    serial: <filter> rev: SDM1 temp: 31 C scheme: GPT
  SMART: yes state: enabled health: PASSED on: 118d 19h cycles: 1566
    read: 728.77 GiB written: 1008.05 GiB Pre-Fail: attribute: Spin_Retry_Count
    value: 100 worst: 100 threshold: 97
  Message: No optical or floppy data found.
  ID-1: / raw-size: 47.69 GiB size: 46.81 GiB (98.17%) used: 37.73 GiB (80.6%)
    fs: ext4 block-size: 4096 B dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-3585502a-3e84-419c-a755-361793fd7ed4 label: N/A
    uuid: dd5f460a-c6c0-4776-b2ca-0d41b4276976
  ID-2: /boot raw-size: 320 MiB size: 289.9 MiB (90.59%)
    used: 198.2 MiB (68.4%) fs: ext4 block-size: 1024 B dev: /dev/sda7
    maj-min: 8:7 label: N/A uuid: 9532c92e-91b3-41f4-b64f-f2ada1131c6c
  ID-3: /boot/efi raw-size: 260 MiB size: 256 MiB (98.46%)
    used: 26.6 MiB (10.4%) fs: vfat block-size: 512 B dev: /dev/sda1
    maj-min: 8:1 label: SYSTEM uuid: 6814-4214
  ID-4: /mnt/hdd2 raw-size: 100 GiB size: 97.87 GiB (97.87%)
    used: 11.18 GiB (11.4%) fs: ext4 block-size: 4096 B dev: /dev/dm-1
    maj-min: 254:1 mapped: luks-hdd2 label: N/A uuid: N/A
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: file size: 8 GiB used: 0 KiB (0.0%) priority: -2
    file: /swapfile
  ID-1: /dev/sda2 maj-min: 8:2 size: 16 MiB fs: N/A label: N/A uuid: N/A
  ID-2: /dev/sda3 maj-min: 8:3 size: 188.9 GiB fs: bitlocker label: N/A
    uuid: N/A
  ID-3: /dev/sda5 maj-min: 8:5 size: 531 MiB fs: ntfs label: N/A
    uuid: A0325CCC325CA954
  ID-4: /dev/sda6 maj-min: 8:6 size: 800 MiB fs: ntfs label: RECOVERY
    uuid: 72CE1E7DCE1E3A35
  ID-5: /dev/sdb1 maj-min: 8:17 size: 731.51 GiB fs: bitlocker label: N/A
    uuid: N/A
  Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 16 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 1-2:2 info: Logitech M105 Optical Mouse type: Mouse
    driver: hid-generic,usbhid interfaces: 1 rev: 2.0 speed: 1.5 Mb/s
    power: 100mA chip-ID: 046d:c077 class-ID: 0301
  Device-2: 1-6:3 info: Intel Bluetooth wireless interface type: Bluetooth
    driver: btusb interfaces: 2 rev: 2.0 speed: 12 Mb/s power: 100mA
    chip-ID: 8087:0a2b class-ID: e001
  Device-3: 1-7:4 info: IMC Networks USB2.0 HD UVC WebCam type: Video
    driver: uvcvideo interfaces: 2 rev: 2.0 speed: 480 Mb/s power: 500mA
    chip-ID: 13d3:5666 class-ID: 0e02 serial: <filter>
  Device-4: 1-8:5 info: ASUSTek ITE Device(8910) type: Keyboard
    driver: asus,usbhid interfaces: 1 rev: 2.0 speed: 12 Mb/s power: 100mA
    chip-ID: 0b05:1869 class-ID: 0301
  Hub-2: 2-0:1 info: Super-speed hub ports: 8 rev: 3.0 speed: 5 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
  System Temperatures: cpu: 70.0 C pch: 56.5 C mobo: N/A gpu: nvidia
    temp: 63 C
  Fan Speeds (RPM): cpu: 0
  Processes: 299 Uptime: 4h 38m wakeups: 1 Init: systemd v: 250
  tool: systemctl Compilers: gcc: 11.2.0 clang: 13.0.1 Packages: pacman: 1475
  lib: 349 flatpak: 0 Shell: Zsh (sudo) v: 5.8.1 default: Bash v: 5.1.16
  running-in: yakuake inxi: 3.3.15

What is a “kernel list corruption”?
What you showed is part of a crash dump - the crash likely happened because you ran out of memory.
The only thing I can glean from it is that you appear to use zswap.

Please check the log file I uploaded. Most warnings are about some corrupted list. I did run out of memory, but after memory-hogging process was killed, everything should have continued normally. Also, the system did not crash immediately, only about 45 minutes later during shutdown.

What I see is multiple crash dumps related to zwap.
Perhaps you should disable it - or reconfigure it.