Kernel: BUG: KFENCE: out-of-bounds write

Problem:

Ever since the 2024-03-13 [Stable Update], my journalctl started registering the nvidia related kernel BUG errors every day the system is used.

Scanned the logs with search text: out-of-bounds write in _nv for the time range since 2024-02-02 to now (2024-03-17), here is the output:

Mar 13 14:42:03 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 15 11:47:10 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 12:32:16 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 12:33:52 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 12:39:20 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 13:32:35 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]

I don’t know what impact this has if any, I merely came across these logs whilst attempting to investigate system freezes that started happening after installing the 2024-03-06 and 2024-02-21 [Stable Updates] on 2024-03-10. (couple day’s prior to this latest stable release)

Logs

Here are the full logs for the nvidia related kernel bug:

Mar 17 13:30:35 user1 pamac-tray-plas[955]: updates_checker.vala:70: check updates
Mar 17 13:30:38 user1 pamac-tray-plas[955]: updates_checker.vala:100: 8 updates found
Mar 17 13:32:35 user1 wpa_supplicant[510]: Removed BSSID 48:d3:43:f1:11:49 from ignore list (expired)
Mar 17 13:32:35 user1 kernel: ==================================================================
Mar 17 13:32:35 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 13:32:35 user1 kernel: Out-of-bounds write at 0x0000000093d9fa21 (24B left of kfence-#16):
Mar 17 13:32:35 user1 kernel:  _nv044009rm+0x10/0x30 [nvidia]
Mar 17 13:32:35 user1 kernel:  _nv014559rm+0x4d/0x90 [nvidia]
Mar 17 13:32:35 user1 kernel:  _nv049696rm+0x18/0x60 [nvidia]
Mar 17 13:32:35 user1 kernel:  _nv026805rm+0x61/0x90 [nvidia]
Mar 17 13:32:35 user1 kernel:  rm_power_source_change_event+0x21/0x174 [nvidia]
Mar 17 13:32:35 user1 kernel:  acpi_ev_notify_dispatch+0x4b/0x70
Mar 17 13:32:35 user1 kernel:  acpi_os_execute_deferred+0x17/0x30
Mar 17 13:32:35 user1 kernel:  process_one_work+0x171/0x340
Mar 17 13:32:35 user1 kernel:  worker_thread+0x27b/0x3a0
Mar 17 13:32:35 user1 kernel:  kthread+0xe5/0x120
Mar 17 13:32:35 user1 kernel:  ret_from_fork+0x31/0x50
Mar 17 13:32:35 user1 kernel:  ret_from_fork_asm+0x1b/0x30
Mar 17 13:32:35 user1 kernel: 
Mar 17 13:32:35 user1 kernel: kfence-#16: 0x00000000f7fe8553-0x000000001c93b2ff, size=80, cache=Acpi-State
Mar 17 13:32:35 user1 kernel: allocated by task 342 on cpu 5 at 3746.799092s:
Mar 17 13:32:35 user1 kernel:  acpi_ut_create_generic_state+0x37/0x50
Mar 17 13:32:35 user1 kernel:  acpi_ev_queue_notify_request+0x72/0x1e0
Mar 17 13:32:35 user1 kernel:  acpi_ex_opcode_2A_0T_0R+0xb0/0xe0
Mar 17 13:32:35 user1 kernel:  acpi_ds_exec_end_op+0x1f6/0x860
Mar 17 13:32:35 user1 kernel:  acpi_ps_parse_loop+0x265/0xa30
Mar 17 13:32:35 user1 kernel:  acpi_ps_parse_aml+0x221/0x5e0
Mar 17 13:32:35 user1 kernel:  acpi_ps_execute_method+0x171/0x3e0
Mar 17 13:32:35 user1 kernel:  acpi_ns_evaluate+0x174/0x5d0
Mar 17 13:32:35 user1 kernel:  acpi_evaluate_object+0x16f/0x450
Mar 17 13:32:35 user1 kernel:  acpi_ec_event_processor+0xa8/0x100
Mar 17 13:32:35 user1 kernel:  process_one_work+0x171/0x340
Mar 17 13:32:35 user1 kernel:  worker_thread+0x27b/0x3a0
Mar 17 13:32:35 user1 kernel:  kthread+0xe5/0x120
Mar 17 13:32:35 user1 kernel:  ret_from_fork+0x31/0x50
Mar 17 13:32:35 user1 kernel:  ret_from_fork_asm+0x1b/0x30
Mar 17 13:32:35 user1 kernel: 
Mar 17 13:32:35 user1 kernel: CPU: 0 PID: 2646 Comm: kworker/0:2 Tainted: P    B      OE      6.6.19-1-MANJARO #1 76c482e512047110118a77981ac42e42c9746e1c
Mar 17 13:32:35 user1 kernel: Hardware name: LENOVO 20EQS0VV07/20EQS0VV07, BIOS N1EET76W (1.49 ) 02/21/2018
Mar 17 13:32:35 user1 kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Mar 17 13:32:35 user1 kernel: ==================================================================

Other info:

System information:

System:
  Kernel: 6.6.19-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64
    root=UUID=cbdf1a8b-406b-433c-9528-a87b111401b3 rw quiet
    resume=UUID=43412e1e-1f33-4bf6-a039-ea2d38607166 udev.log_priority=3
  Desktop: KDE Plasma v: 5.27.11 tk: Qt v: 5.15.12 info: frameworks
    v: 5.115.0 wm: kwin_x11 vt: 2 dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
  Type: Laptop System: LENOVO product: 20EQS0VV07 v: ThinkPad P50
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: LENOVO model: 20EQS0VV07 v: SDK0J40705 WIN
    serial: <superuser required> part-nu: LENOVO_MT_20EQ_BU_Think_FM_ThinkPad P50
    uuid: <superuser required> UEFI: LENOVO v: N1EET76W (1.49 )
    date: 02/21/2018
Battery:
  ID-1: BAT0 charge: 48.0 Wh (100.0%) condition: 48.0/90.1 Wh (53.3%)
    volts: 12.7 min: 11.4 model: LGC 00NY492 type: Li-poly serial: <filter>
    status: not charging
CPU:
  Info: model: Intel Core i7-6820HQ bits: 64 type: MT MCP arch: Skylake-S
    gen: core 6 level: v3 note: check built: 2015 process: Intel 14nm family: 6
    model-id: 0x5E (94) stepping: 3 microcode: 0xF0
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
    L3: 8 MiB desc: 1x8 MiB
  Speed (MHz): avg: 800 min/max: 800/3600 scaling: driver: intel_pstate
    governor: powersave cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800
    8: 800 bogomips: 43214
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
  Vulnerabilities:
  Type: gather_data_sampling status: Vulnerable: No microcode
  Type: itlb_multihit status: KVM: VMX unsupported
  Type: l1tf mitigation: PTE Inversion
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: mmio_stale_data mitigation: Clear CPU buffers; SMT vulnerable
  Type: retbleed mitigation: IBRS
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: IBRS, IBPB: conditional, STIBP: conditional,
    RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort mitigation: TSX disabled
Graphics:
  Device-1: Intel HD Graphics 530 vendor: Lenovo driver: i915 v: kernel
    arch: Gen-9 process: Intel 14n built: 2015-16 ports: active: eDP-1
    empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:191b
    class-ID: 0300
  Device-2: NVIDIA GM107GLM [Quadro M1000M] vendor: Lenovo driver: nvidia
    v: 550.54.14 alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current
    (as of 2024-02; EOL~2026-12-xx) arch: Maxwell code: GMxxx
    process: TSMC 28nm built: 2014-2019 pcie: gen: 1 speed: 2.5 GT/s lanes: 16
    link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.0 chip-ID: 10de:13b1
    class-ID: 0300
  Device-3: Chicony Integrated Camera driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-8:3 chip-ID: 04f2:b52c
    class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: 21.1.11 compositor: kwin_x11 driver: X:
    loaded: modesetting,nvidia alternate: fbdev,nouveau,nv,vesa dri: iris
    gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
    s-diag: 582mm (22.93")
  Monitor-1: eDP-1 model: LG Display 0x04a7 built: 2015 res: 1920x1080 hz: 60
    dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64") diag: 395mm (15.5")
    ratio: 16:9 modes: 1920x1080
  API: EGL v: 1.5 hw: drv: intel iris drv: nvidia platforms: device: 0
    drv: nvidia device: 2 drv: iris device: 3 drv: swrast gbm: drv: kms_swrast
    surfaceless: drv: nvidia x11: drv: iris inactive: wayland,device-1
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: intel mesa v: 24.0.2-manjaro1.1
    glx-v: 1.4 direct-render: yes renderer: Mesa Intel HD Graphics 530 (SKL GT2)
    device-ID: 8086:191b memory: 7.41 GiB unified: yes
  API: Vulkan v: 1.3.279 layers: 5 device: 0 type: discrete-gpu
    name: Quadro M1000M driver: nvidia v: 550.54.14 device-ID: 10de:13b1
    surfaces: xcb,xlib
Audio:
  Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel alternate: snd_soc_avs bus-ID: 00:1f.3
    chip-ID: 8086:a170 class-ID: 0403
  Device-2: NVIDIA GM107 High Definition Audio [GeForce 940MX]
    driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
    bus-ID: 01:00.1 chip-ID: 10de:0fbc class-ID: 0403
  API: ALSA v: k6.6.19-1-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: JACK v: 1.9.22 status: off tools: N/A
  Server-2: PipeWire v: 1.0.3 status: off with: pipewire-media-session
    status: active tools: pw-cli
  Server-3: PulseAudio v: 17.0 status: active with: pulseaudio-alsa
    type: plugin tools: pacat,pactl
Network:
  Device-1: Intel Ethernet I219-LM vendor: Lenovo driver: e1000e v: kernel
    port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15b7 class-ID: 0200
  IF: enp0s31f6 state: down mac: <filter>
  Device-2: Intel Wireless 8260 driver: iwlwifi v: kernel pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 bus-ID: 04:00.0 chip-ID: 8086:24f3 class-ID: 0280
  IF: wlp4s0 state: up mac: <filter>
  Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
  Device-1: Intel Bluetooth wireless interface driver: btusb v: 0.8 type: USB
    rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-14:5 chip-ID: 8087:0a2b
    class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
  Local Storage: total: 238.47 GiB used: 179.07 GiB (75.1%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/sda maj-min: 8:0 vendor: Toshiba model: THNSFJ256GDNU A
    size: 238.47 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: 1102 scheme: GPT
Partition:
  ID-1: / raw-size: 229.37 GiB size: 224.71 GiB (97.97%)
    used: 178.81 GiB (79.6%) fs: ext4 dev: /dev/sda2 maj-min: 8:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 312 KiB (0.1%) fs: vfat dev: /dev/sda1 maj-min: 8:1
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
    compressor: zstd max-pool: 20%
  ID-1: swap-1 type: partition size: 8.8 GiB used: 264.2 MiB (2.9%)
    priority: -2 dev: /dev/sda3 maj-min: 8:3
Sensors:
  System Temperatures: cpu: 40.0 C pch: 45.0 C mobo: N/A
  Fan Speeds (rpm): fan-1: 0 fan-2: 0
Info:
  Memory: total: 8 GiB note: est. available: 7.59 GiB used: 5.41 GiB (71.3%)
  Processes: 281 Power: uptime: 1h 3m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 3.03 GiB services: org_kde_powerdevil,upowerd
    Init: systemd v: 255 default: graphical tool: systemctl
  Packages: 1573 pm: pacman pkgs: 1491 libs: 400 tools: pamac pm: flatpak
    pkgs: 82 Compilers: clang: 16.0.6 gcc: 13.2.1 Shell: Zsh v: 5.9 default: Bash
    v: 5.2.26 running-in: konsole inxi: 3.3.33

Thus far merely reporting an issue. If anyone knows why it’s happening, would be cool to hear.

Sorry, can’t reproduce. I also have a hybrid Intel + NVIDIA laptop. :man_shrugging:

I know just saying that isn’t very helpful, however if no one else can reproduce it, they most likely wouldn’t respond at all.

No worries. I suspected this may not be reproduce’able. May be my system state related - it’s nearly a 2 year old installation. Thanks for the response regardless!

I was more of hoping to report it for statistics and dev awareness in case it’s a legitimate issue. Not surebif I’m reporting in the right place, though.

That said, if any dev wants to dig more on this, I’d be happy to help with any logs or tests I can provide. I’m quite curious as to what happened :smile:

Otherwise, I’m going to try changing my kernel version like the updates anouncer suggested on another one of my other posts. If that won’t help, then it’s probably a reinstall time. But I won’t tend to any this for a couple of days st least.

Consider updating BIOS/UEFI -Thinkpad P50 - Lenovo Support US