NetworkManager Stability Issues since latest update

Hi all, Been a happy Manjaro user for many years now - running on a Thinkpad X1 Carbon Gen6. Many thanks to all those who work on this distro!

However, since the most recent kernel updates (I have tried both 6.8.5-1 and 6.6.26-1 LTS), I have been having unstable behavior. I have experienced several kernel panics and other kernel issues that eventually require a reboot or hard shutdown to resume normal system operation. I have not changed anything on the system recently and have not experienced a single such issue over the years.

Issues in the log are usually along the lines of:
BUG: scheduling while atomic: kworker/6:0/171277/0x00000002
or
watchdog: BUG: soft lockup - CPU#6 stuck for 160s! [kworker/6:1:168392] (multiple repeated)

Other strange symptons I have experienced since this update are NetworkManager seg-faulting (dumping core), and Bluetooth connections can sometimes not be managed via gui (neither Plasma nor blueman).

My first thought was a hardware failure. I have run all of Lenovo’s hardware tests and they all pass. I have also updated the Bios to the latest version.

I realise this is all quite vague. However, the issues appear random. I guess I’m wondering whether:

  • there have been any recent changes that could lead to such issues; and
  • anyone can offer suggestions on how to troubleshoot further? (Since I don’t see other topic in the forum, it’s more likely something specific on my end.)

Thanks!

Maybe begin with an

inxi -Farz

Also included is a guide for things like formatting in case you need it

Thanks for the reply!

inxi -Farz output is:

System:
  Kernel: 6.6.26-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: tsc avail: acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64
    root=UUID=9c4bef01-d0e8-4736-819d-d44fc11a9434 rw quiet
    msr.allow_writes=on
    cryptdevice=UUID=453a0272-7f94-469c-b655-f58eee4a1ce3:luks-453a0272-7f94-469c-b655-f58eee4a1ce3
    root=/dev/mapper/luks-453a0272-7f94-469c-b655-f58eee4a1ce3
    resume=/dev/mapper/luks-453a0272-7f94-469c-b655-f58eee4a1ce3
    resume=UUID=955e1562-7d18-4778-97d4-b148a14a03ad intel_iommu=on
  Desktop: KDE Plasma v: 5.27.11 tk: Qt v: 5.15.12 info: frameworks
    v: 5.115.0 wm: kwin_wayland with: krunner vt: 2 dm: SDDM Distro: Manjaro
    base: Arch Linux
Machine:
  Type: Laptop System: LENOVO product: 20KHCTO1WW v: ThinkPad X1 Carbon 6th
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: LENOVO model: 20KHCTO1WW v: SDK0J40709 WIN
    serial: <superuser required> part-nu: LENOVO_MT_20KH_BU_Think_FM_ThinkPad
    X1 Carbon 6th uuid: <superuser required> UEFI: LENOVO v: N23ET88W (1.63 )
    date: 02/28/2024
Battery:
  ID-1: BAT0 charge: 26.7 Wh (68.1%) condition: 39.2/57.0 Wh (68.7%)
    power: 39.5 W volts: 12.8 min: 11.6 model: LGC 01AV494 type: Li-poly
    serial: <filter> status: charging cycles: 1568
CPU:
  Info: model: Intel Core i7-8550U bits: 64 type: MT MCP arch: Coffee Lake
    gen: core 8 level: v3 note: check built: 2017 process: Intel 14nm family: 6
    model-id: 0x8E (142) stepping: 0xA (10) microcode: 0xF4
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
    L3: 8 MiB desc: 1x8 MiB
  Speed (MHz): avg: 550 high: 700 min/max: 400/4000 scaling:
    driver: intel_pstate governor: powersave cores: 1: 400 2: 700 3: 400 4: 400
    5: 700 6: 700 7: 400 8: 700 bogomips: 32012
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: gather_data_sampling mitigation: Microcode
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT
    vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: mmio_stale_data mitigation: Clear CPU buffers; SMT vulnerable
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed mitigation: IBRS
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: IBRS; IBPB: conditional; STIBP: conditional;
    RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel UHD Graphics 620 vendor: Lenovo driver: i915 v: kernel
    arch: Gen-9.5 process: Intel 14nm built: 2016-20 ports: active: eDP-1
    off: DP-1 empty: DP-2,HDMI-A-1,HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:5917
    class-ID: 0300
  Device-2: IMC Networks Integrated Camera driver: uvcvideo type: USB
    rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-8:3 chip-ID: 13d3:56b2
    class-ID: 0e02
  Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 23.2.6
    compositor: kwin_wayland driver: X: loaded: intel dri: i965 gpu: i915
    display-ID: 0
  Monitor-1: eDP-1 res: 1920x1080 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
    device: 1 drv: swrast surfaceless: drv: iris wayland: drv: iris x11:
    drv: iris inactive: gbm
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.0.2-manjaro1.1
    glx-v: 1.4 direct-render: yes renderer: Mesa Intel UHD Graphics 620 (KBL
    GT2) device-ID: 8086:5917 memory: 15.03 GiB unified: yes display-ID: :0.0
  API: Vulkan v: 1.3.279 layers: 4 device: 0 type: integrated-gpu name: Intel
    UHD Graphics 620 (KBL GT2) driver: mesa intel v: 24.0.2-manjaro1.1
    device-ID: 8086:5917 surfaces: xcb,xlib,wayland
Audio:
  Device-1: Intel Sunrise Point-LP HD Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel alternate: snd_soc_skl,snd_soc_avs
    bus-ID: 00:1f.3 chip-ID: 8086:9d71 class-ID: 0403
  API: ALSA v: k6.6.26-1-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: PipeWire v: 1.0.3 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Intel Ethernet I219-V vendor: Lenovo driver: e1000e v: kernel
    port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15d8 class-ID: 0200
  IF: enp0s31f6 state: down mac: <filter>
  Device-2: Intel Wireless 8265 / 8275 driver: iwlwifi v: kernel pcie:
    gen: 1 speed: 2.5 GT/s lanes: 1 bus-ID: 02:00.0 chip-ID: 8086:24fd
    class-ID: 0280
  IF: wlp2s0 state: up mac: <filter>
  Info: services: NetworkManager, sshd, systemd-timesyncd, wpa_supplicant
Bluetooth:
  Device-1: Intel Bluetooth wireless interface driver: btusb v: 0.8 type: USB
    rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-7:2 chip-ID: 8087:0a2b
    class-ID: e001
  Report: btmgmt ID: hci0 rfk-id: 4 state: up address: <filter> bt-v: 4.2
    lmp-v: 8 status: discoverable: no pairing: no class-ID: 6c010c
Drives:
  Local Storage: total: 476.94 GiB used: 268.9 GiB (56.4%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung
    model: MZVLB512HAJQ-000L7 size: 476.94 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 5L2QEXA7 temp: 41.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 453.23 GiB size: 445.05 GiB (98.19%)
    used: 268.7 GiB (60.4%) fs: ext4 dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-453a0272-7f94-469c-b655-f58eee4a1ce3
  ID-2: /boot/efi raw-size: 260 MiB size: 256 MiB (98.46%)
    used: 29.6 MiB (11.6%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
    compressor: zstd max-pool: 20%
  ID-1: swap-1 type: partition size: 23.44 GiB used: 167.7 MiB (0.7%)
    priority: -2 dev: /dev/nvme0n1p5 maj-min: 259:3
Sensors:
  System Temperatures: cpu: 47.0 C pch: 48.0 C mobo: N/A
  Fan Speeds (rpm): fan-1: 0
Repos:
  Packages: pm: pacman pkgs: 2775 libs: 563 tools: octopi,pamac,yay
    pm: flatpak pkgs: 0
  Active pacman repo servers in: /etc/pacman.d/mirrorlist
    1: https://mirror.aarnet.edu.au/pub/manjaro/stable/$repo/$arch
Info:
  Memory: total: 16 GiB note: est. available: 15.39 GiB used: 9.8 GiB (63.7%)
  Processes: 353 Power: uptime: 18h 38m states: freeze,mem,disk
    suspend: deep avail: s2idle wakeups: 2 hibernate: platform avail: shutdown,
    reboot, suspend, test_resume image: 6.15 GiB
    services: org_kde_powerdevil,upowerd Init: systemd v: 255
    default: graphical tool: systemctl
  Compilers: clang: 16.0.6 gcc: 13.2.1 Shell: Bash v: 5.2.26
    running-in: tmux: inxi: 3.3.34

I’ve had networkmanager crash multiple times today, but system itself has stayed functional. I’ve now tried disabling the associated systemd service and running directly from terminal via networkmanager --no-daemon --debug in case that leads to more useful information.

Update: This setup has remained stable for the entire day. No kernel errors such as those in the original post.

However, trying to restart NetworkManager.service via systemd resulted in multiple errors as shown, and finally a kernel panic (system completely unresponsive and flashing Caps-Lock key):

Apr 16 18:47:04 kernel: BUG: scheduling while atomic: NetworkManager/201980/0x00000002
Apr 16 18:47:04 kernel: BUG: scheduling while atomic: NetworkManager/201980/0x00000000
Apr 16 18:47:05 systemd-coredump[202009]: Process 201980 (NetworkManager) of user 0 dumped core.
Apr 16 18:47:05 systemd[1]: Failed to start Network Manager.
Apr 16 18:47:05 kernel: BUG: scheduling while atomic: NetworkManager/202045/0x00000002
Apr 16 18:47:05 kernel: BUG: scheduling while atomic: NetworkManager/202045/0x00000000
Apr 16 18:47:05 NetworkManager[202045]: <error> [1713257225.3343] platform-linux: netlink[rtnl]: read: failed to retrieve incoming events: Bad address (-14)
Apr 16 18:47:05 NetworkManager[202045]: <error> [1713257225.3344] platform-linux: netlink[rtnl]: read: failed to retrieve incoming events: Bad address (-14)
Apr 16 18:47:05 systemd-coredump[202052]: Process 202045 (NetworkManager) of user 0 dumped core.
Apr 16 18:47:05 systemd[1]: Failed to start Network Manager.
Apr 16 18:47:05 NetworkManager[202061]: <error> [1713257225.8595] platform-linux: netlink[rtnl]: read: failed to retrieve incoming events: Bad address (-14)
Apr 16 18:47:05 NetworkManager[202061]: <error> [1713257225.8596] platform-linux: netlink[rtnl]: read: failed to retrieve incoming events: Bad address (-14)
Apr 16 18:47:05 kernel: BUG: scheduling while atomic: NetworkManager/202061/0x00000002
Apr 16 18:47:05 kernel: BUG: scheduling while atomic: NetworkManager/202061/0x00000000
Apr 16 18:47:05 kernel: BUG: scheduling while atomic: Link Monitor/1903/0x00000002
Apr 16 18:47:05 kernel: BUG: scheduling while atomic: Link Monitor/1903/0x00000000
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: conky/111712/0x00000002
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: conky/111712/0x00000000
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: Qt bearer threa/1934/0x00000002
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: Qt bearer threa/1877/0x00000002
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: Qt bearer threa/1934/0x00000000
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: Qt bearer threa/1934/0x00000002
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: Qt bearer threa/1877/0x00000000
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: Qt bearer threa/1934/0x00000000
Apr 16 18:47:06 systemd-coredump[202066]: Process 202061 (NetworkManager) of user 0 dumped core.
Apr 16 18:47:06 systemd[1]: Failed to start Network Manager.
Apr 16 18:47:06 systemd-coredump[202073]: Process 111704 (conky) of user 1000 dumped core.
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: NetworkManager/202083/0x00000002
Apr 16 18:47:06 kernel: BUG: scheduling while atomic: NetworkManager/202083/0x00000000
Apr 16 18:47:06 NetworkManager[202083]: <error> [1713257226.5571] platform-linux: netlink[rtnl]: read: failed to retrieve incoming events: Bad address (-14)
Apr 16 18:47:06 NetworkManager[202083]: <error> [1713257226.5571] platform-linux: netlink[rtnl]: read: failed to retrieve incoming events: Bad address (-14)
Apr 16 18:47:06 systemd-coredump[202094]: Process 202083 (NetworkManager) of user 0 dumped core.
Apr 16 18:47:06 systemd[1]: Failed to start Network Manager.
Apr 16 18:47:08 kernel: BUG: scheduling while atomic: kworker/1:4/194948/0x00000002
Apr 16 18:47:08 kernel: BUG: workqueue leaked lock or atomic: kworker/1:4/0x7fffffff/194948
Apr 16 18:47:08 kernel: BUG: scheduling while atomic: kworker/1:4/194948/0x00000000
Apr 16 18:47:08 kernel: BUG: scheduling while atomic: Link Monitor/1903/0x00000002
Apr 16 18:47:08 kernel: BUG: scheduling while atomic: NetworkManager/202109/0x00000002
Apr 16 18:47:08 kernel: BUG: scheduling while atomic: Link Monitor/1903/0x00000000
Apr 16 18:47:08 kernel: BUG: scheduling while atomic: NetworkManager/202109/0x00000000
Apr 16 18:47:09 kernel: BUG: scheduling while atomic: Qt bearer threa/1522/0x00000002
Apr 16 18:47:09 kernel: BUG: scheduling while atomic: Qt bearer threa/1522/0x00000000
Apr 16 18:47:09 systemd-coredump[202144]: Process 202109 (NetworkManager) of user 0 dumped core.
Apr 16 18:47:09 systemd[1]: Failed to start Network Manager.
Apr 16 18:47:09 systemd[1]: Failed to start Network Manager.

Today’s experience leads me to believe that this in not a kernel topic. It is something systemd and/or NetworkManager.service related. I’ve edited the original title to reflect this.

Any ideas why networkmanager would behave so badly when run via systemd but totally fine when run via networkmanager --no-daemon --debug ?

Question: Could this kind of strange intermittent behaviour be explained by intermittent connection to the wifi card? I am always on wifi (not ethernet), and it always seems to be NetworkManager crashing that leads to the subsequent kernel errors.

I’ve had the laptop since new (approx 6 years) and, whilst I look after it, it has experienced a few accidental bumps in its time. Wondering it is worth opening and re-seating the card? (Would rather not open it up if it’s completely infeasible for this to be the cause.)

IMHO its the same issue

https://bbs.archlinux.org/viewtopic.php?id=294828

6.8.5 and 6.8.6 (now in testing) are affected by it.

I had to downgrade to 6.8.4.

1 Like

Thanks for that! I had not seen that thread when searching.

So 6.1.85 (LTS) is not affected? I have this installed also. Perhaps easier to just reboot into this than downgrade? How would I go about downgrading?

Sorry, I have not tried 6.1.x.

Ok, I’m back to 6.8.4-1. Fingers crossed!

Thanks again for chiming in on this and pointing me in what I suspect is the right direction!

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.