Problem:
Ever since the 2024-03-13 [Stable Update], my journalctl started registering the nvidia related kernel BUG errors every day the system is used.
Scanned the logs with search text: out-of-bounds write in _nv
for the time range since 2024-02-02 to now (2024-03-17), here is the output:
Mar 13 14:42:03 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 15 11:47:10 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 12:32:16 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 12:33:52 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 12:39:20 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 13:32:35 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
I don’t know what impact this has if any, I merely came across these logs whilst attempting to investigate system freezes that started happening after installing the 2024-03-06 and 2024-02-21 [Stable Updates] on 2024-03-10. (couple day’s prior to this latest stable release)
Logs
Here are the full logs for the nvidia related kernel bug:
Mar 17 13:30:35 user1 pamac-tray-plas[955]: updates_checker.vala:70: check updates
Mar 17 13:30:38 user1 pamac-tray-plas[955]: updates_checker.vala:100: 8 updates found
Mar 17 13:32:35 user1 wpa_supplicant[510]: Removed BSSID 48:d3:43:f1:11:49 from ignore list (expired)
Mar 17 13:32:35 user1 kernel: ==================================================================
Mar 17 13:32:35 user1 kernel: BUG: KFENCE: out-of-bounds write in _nv044009rm+0x10/0x30 [nvidia]
Mar 17 13:32:35 user1 kernel: Out-of-bounds write at 0x0000000093d9fa21 (24B left of kfence-#16):
Mar 17 13:32:35 user1 kernel: _nv044009rm+0x10/0x30 [nvidia]
Mar 17 13:32:35 user1 kernel: _nv014559rm+0x4d/0x90 [nvidia]
Mar 17 13:32:35 user1 kernel: _nv049696rm+0x18/0x60 [nvidia]
Mar 17 13:32:35 user1 kernel: _nv026805rm+0x61/0x90 [nvidia]
Mar 17 13:32:35 user1 kernel: rm_power_source_change_event+0x21/0x174 [nvidia]
Mar 17 13:32:35 user1 kernel: acpi_ev_notify_dispatch+0x4b/0x70
Mar 17 13:32:35 user1 kernel: acpi_os_execute_deferred+0x17/0x30
Mar 17 13:32:35 user1 kernel: process_one_work+0x171/0x340
Mar 17 13:32:35 user1 kernel: worker_thread+0x27b/0x3a0
Mar 17 13:32:35 user1 kernel: kthread+0xe5/0x120
Mar 17 13:32:35 user1 kernel: ret_from_fork+0x31/0x50
Mar 17 13:32:35 user1 kernel: ret_from_fork_asm+0x1b/0x30
Mar 17 13:32:35 user1 kernel:
Mar 17 13:32:35 user1 kernel: kfence-#16: 0x00000000f7fe8553-0x000000001c93b2ff, size=80, cache=Acpi-State
Mar 17 13:32:35 user1 kernel: allocated by task 342 on cpu 5 at 3746.799092s:
Mar 17 13:32:35 user1 kernel: acpi_ut_create_generic_state+0x37/0x50
Mar 17 13:32:35 user1 kernel: acpi_ev_queue_notify_request+0x72/0x1e0
Mar 17 13:32:35 user1 kernel: acpi_ex_opcode_2A_0T_0R+0xb0/0xe0
Mar 17 13:32:35 user1 kernel: acpi_ds_exec_end_op+0x1f6/0x860
Mar 17 13:32:35 user1 kernel: acpi_ps_parse_loop+0x265/0xa30
Mar 17 13:32:35 user1 kernel: acpi_ps_parse_aml+0x221/0x5e0
Mar 17 13:32:35 user1 kernel: acpi_ps_execute_method+0x171/0x3e0
Mar 17 13:32:35 user1 kernel: acpi_ns_evaluate+0x174/0x5d0
Mar 17 13:32:35 user1 kernel: acpi_evaluate_object+0x16f/0x450
Mar 17 13:32:35 user1 kernel: acpi_ec_event_processor+0xa8/0x100
Mar 17 13:32:35 user1 kernel: process_one_work+0x171/0x340
Mar 17 13:32:35 user1 kernel: worker_thread+0x27b/0x3a0
Mar 17 13:32:35 user1 kernel: kthread+0xe5/0x120
Mar 17 13:32:35 user1 kernel: ret_from_fork+0x31/0x50
Mar 17 13:32:35 user1 kernel: ret_from_fork_asm+0x1b/0x30
Mar 17 13:32:35 user1 kernel:
Mar 17 13:32:35 user1 kernel: CPU: 0 PID: 2646 Comm: kworker/0:2 Tainted: P B OE 6.6.19-1-MANJARO #1 76c482e512047110118a77981ac42e42c9746e1c
Mar 17 13:32:35 user1 kernel: Hardware name: LENOVO 20EQS0VV07/20EQS0VV07, BIOS N1EET76W (1.49 ) 02/21/2018
Mar 17 13:32:35 user1 kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Mar 17 13:32:35 user1 kernel: ==================================================================
Other info:
System information:
System:
Kernel: 6.6.19-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64
root=UUID=cbdf1a8b-406b-433c-9528-a87b111401b3 rw quiet
resume=UUID=43412e1e-1f33-4bf6-a039-ea2d38607166 udev.log_priority=3
Desktop: KDE Plasma v: 5.27.11 tk: Qt v: 5.15.12 info: frameworks
v: 5.115.0 wm: kwin_x11 vt: 2 dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
Type: Laptop System: LENOVO product: 20EQS0VV07 v: ThinkPad P50
serial: <superuser required> Chassis: type: 10 serial: <superuser required>
Mobo: LENOVO model: 20EQS0VV07 v: SDK0J40705 WIN
serial: <superuser required> part-nu: LENOVO_MT_20EQ_BU_Think_FM_ThinkPad P50
uuid: <superuser required> UEFI: LENOVO v: N1EET76W (1.49 )
date: 02/21/2018
Battery:
ID-1: BAT0 charge: 48.0 Wh (100.0%) condition: 48.0/90.1 Wh (53.3%)
volts: 12.7 min: 11.4 model: LGC 00NY492 type: Li-poly serial: <filter>
status: not charging
CPU:
Info: model: Intel Core i7-6820HQ bits: 64 type: MT MCP arch: Skylake-S
gen: core 6 level: v3 note: check built: 2015 process: Intel 14nm family: 6
model-id: 0x5E (94) stepping: 3 microcode: 0xF0
Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
L3: 8 MiB desc: 1x8 MiB
Speed (MHz): avg: 800 min/max: 800/3600 scaling: driver: intel_pstate
governor: powersave cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800
8: 800 bogomips: 43214
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
Vulnerabilities:
Type: gather_data_sampling status: Vulnerable: No microcode
Type: itlb_multihit status: KVM: VMX unsupported
Type: l1tf mitigation: PTE Inversion
Type: mds mitigation: Clear CPU buffers; SMT vulnerable
Type: meltdown mitigation: PTI
Type: mmio_stale_data mitigation: Clear CPU buffers; SMT vulnerable
Type: retbleed mitigation: IBRS
Type: spec_rstack_overflow status: Not affected
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: IBRS, IBPB: conditional, STIBP: conditional,
RSB filling, PBRSB-eIBRS: Not affected
Type: srbds mitigation: Microcode
Type: tsx_async_abort mitigation: TSX disabled
Graphics:
Device-1: Intel HD Graphics 530 vendor: Lenovo driver: i915 v: kernel
arch: Gen-9 process: Intel 14n built: 2015-16 ports: active: eDP-1
empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:191b
class-ID: 0300
Device-2: NVIDIA GM107GLM [Quadro M1000M] vendor: Lenovo driver: nvidia
v: 550.54.14 alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current
(as of 2024-02; EOL~2026-12-xx) arch: Maxwell code: GMxxx
process: TSMC 28nm built: 2014-2019 pcie: gen: 1 speed: 2.5 GT/s lanes: 16
link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.0 chip-ID: 10de:13b1
class-ID: 0300
Device-3: Chicony Integrated Camera driver: uvcvideo type: USB rev: 2.0
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-8:3 chip-ID: 04f2:b52c
class-ID: 0e02 serial: <filter>
Display: x11 server: X.Org v: 21.1.11 compositor: kwin_x11 driver: X:
loaded: modesetting,nvidia alternate: fbdev,nouveau,nv,vesa dri: iris
gpu: i915 display-ID: :0 screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
s-diag: 582mm (22.93")
Monitor-1: eDP-1 model: LG Display 0x04a7 built: 2015 res: 1920x1080 hz: 60
dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64") diag: 395mm (15.5")
ratio: 16:9 modes: 1920x1080
API: EGL v: 1.5 hw: drv: intel iris drv: nvidia platforms: device: 0
drv: nvidia device: 2 drv: iris device: 3 drv: swrast gbm: drv: kms_swrast
surfaceless: drv: nvidia x11: drv: iris inactive: wayland,device-1
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: intel mesa v: 24.0.2-manjaro1.1
glx-v: 1.4 direct-render: yes renderer: Mesa Intel HD Graphics 530 (SKL GT2)
device-ID: 8086:191b memory: 7.41 GiB unified: yes
API: Vulkan v: 1.3.279 layers: 5 device: 0 type: discrete-gpu
name: Quadro M1000M driver: nvidia v: 550.54.14 device-ID: 10de:13b1
surfaces: xcb,xlib
Audio:
Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Lenovo
driver: snd_hda_intel v: kernel alternate: snd_soc_avs bus-ID: 00:1f.3
chip-ID: 8086:a170 class-ID: 0403
Device-2: NVIDIA GM107 High Definition Audio [GeForce 940MX]
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
bus-ID: 01:00.1 chip-ID: 10de:0fbc class-ID: 0403
API: ALSA v: k6.6.19-1-MANJARO status: kernel-api with: aoss
type: oss-emulator tools: alsactl,alsamixer,amixer
Server-1: JACK v: 1.9.22 status: off tools: N/A
Server-2: PipeWire v: 1.0.3 status: off with: pipewire-media-session
status: active tools: pw-cli
Server-3: PulseAudio v: 17.0 status: active with: pulseaudio-alsa
type: plugin tools: pacat,pactl
Network:
Device-1: Intel Ethernet I219-LM vendor: Lenovo driver: e1000e v: kernel
port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15b7 class-ID: 0200
IF: enp0s31f6 state: down mac: <filter>
Device-2: Intel Wireless 8260 driver: iwlwifi v: kernel pcie: gen: 1
speed: 2.5 GT/s lanes: 1 bus-ID: 04:00.0 chip-ID: 8086:24f3 class-ID: 0280
IF: wlp4s0 state: up mac: <filter>
Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
Device-1: Intel Bluetooth wireless interface driver: btusb v: 0.8 type: USB
rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-14:5 chip-ID: 8087:0a2b
class-ID: e001
Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
Local Storage: total: 238.47 GiB used: 179.07 GiB (75.1%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/sda maj-min: 8:0 vendor: Toshiba model: THNSFJ256GDNU A
size: 238.47 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 1102 scheme: GPT
Partition:
ID-1: / raw-size: 229.37 GiB size: 224.71 GiB (97.97%)
used: 178.81 GiB (79.6%) fs: ext4 dev: /dev/sda2 maj-min: 8:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 312 KiB (0.1%) fs: vfat dev: /dev/sda1 maj-min: 8:1
Swap:
Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
compressor: zstd max-pool: 20%
ID-1: swap-1 type: partition size: 8.8 GiB used: 264.2 MiB (2.9%)
priority: -2 dev: /dev/sda3 maj-min: 8:3
Sensors:
System Temperatures: cpu: 40.0 C pch: 45.0 C mobo: N/A
Fan Speeds (rpm): fan-1: 0 fan-2: 0
Info:
Memory: total: 8 GiB note: est. available: 7.59 GiB used: 5.41 GiB (71.3%)
Processes: 281 Power: uptime: 1h 3m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 3.03 GiB services: org_kde_powerdevil,upowerd
Init: systemd v: 255 default: graphical tool: systemctl
Packages: 1573 pm: pacman pkgs: 1491 libs: 400 tools: pamac pm: flatpak
pkgs: 82 Compilers: clang: 16.0.6 gcc: 13.2.1 Shell: Zsh v: 5.9 default: Bash
v: 5.2.26 running-in: konsole inxi: 3.3.33
Thus far merely reporting an issue. If anyone knows why it’s happening, would be cool to hear.