CPU core(s) at 100%, no programs running, traced to kworker and USB weirdness

Hi, I’m posting to see if I can get any ideas. I’ve spent many hours doing experiments that I will document below. Some of it is kind of weird and I haven’t really seen anything like it. But, I’m certainly no expert and maybe I missed something obvious?

TLDR;

  • If I cold boot, the computer works fine. None of these issues.
  • If I restart, either with the whisker menu or “shutdown -r now”, the computer will have 1 or more CPU cores pegged at 100% from a kworker process when it reloads.
  • If I plug in an empty USB thumb drive, CPU cores go back to normal but ONLY if I use a USB 3.0 port. If I remove the USB keyboard receiver, another CPU core will go to 100%. If I plug it back in, it goes back to zero.
  • I’ve reinstalled Manjaro several times, tried 3 different kernels, updated the BIOS, and tested all of the USB port volts/amps.

Background
This is a friend’s computer. It was running Windows 10 and was so slow, it would take 10 seconds just to get a right-click context menu. It was painful to watch. I convinced her to let me swap out the hard drive with an SSD and install Manjaro. This person uses their computer for Gmail, surfing the web, and sometimes writing a Word document. Over the past couple of years, I’ve installed Manjaro on a couple dozen machines and everyone is happy so far!

This is my first REALLY weird experience and I hope you don’t mind the story I’m about to tell but I think it’s somewhat necessary and may help some poor soul who has a similar issue in the future…

The Problem
I had just finished installing Manjaro, Chrome, some fonts, and a small list of other things. Most people still want access to Windows for some reason, so I install VirtualBox and I use the Windows Key they already own.

Nothing was running but conky was showing 25% CPU use. Huh? I opened htop and it showed one of the CPU cores pegged at 100%. I started this project several days ago but here’s a screenshot from just now because I’ve learned how to replicate it…

Before I continue my saga, here is the machine info:

Wed Feb  2 03:48:22 AM EST 2022


# inxi --admin --verbosity=7 --filter --width
System:
  Kernel: 5.15.16-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
    parameters: BOOT_IMAGE=/boot/vmlinuz-5.15-x86_64
    root=UUID=ba158eec-2ab7-4dff-b04d-a301777a1a16 rw quiet apparmor=1
    security=apparmor udev.log_priority=3
  Desktop: Xfce 4.16.0 tk: Gtk 3.24.29 info: xfce4-panel wm: xfwm 4.16.1
    vt: 7 dm: LightDM 1.30.0 Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Desktop System: LENOVO product: 90ED0009US v: ideacentre 700-25ISH
    serial: <filter> Chassis: type: 3 serial: <filter>
  Mobo: LENOVO model: SKYBAY v: SDK0J40700 WIN 3258033222747 serial: N/A
    UEFI-[Legacy]: LENOVO v: FWKT47A date: 04/20/2016
Battery:
  Device-1: hidpp_battery_1 model: Logitech Wireless Keyboard serial: <filter>
    charge: 55% (should be ignored) rechargeable: yes status: Discharging
Memory:
  RAM: total: 7.71 GiB used: 977.5 MiB (12.4%)
  Array-1: capacity: 64 GiB slots: 4 EC: None max-module-size: 16 GiB
    note: est.
  Device-1: ChannelA-DIMM0 size: No Module Installed
  Device-2: ChannelA-DIMM1 size: 8 GiB speed: 2133 MT/s type: DDR4
    detail: synchronous bus-width: 64 bits total: 64 bits manufacturer: SK Hynix
    part-no: HMA41GU6AFR8N-TF serial: <filter>
  Device-3: ChannelB-DIMM0 size: No Module Installed
  Device-4: ChannelB-DIMM1 size: No Module Installed
CPU:
  Info: model: Intel Core i5-6400 socket: U3E1 bits: 64 type: MCP
    arch: Skylake-S family: 6 model-id: 0x5E (94) stepping: 3 microcode: 0xEA
  Topology: cpus: 1x cores: 4 smt: <unsupported> cache: L1: 256 KiB
    desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB L3: 6 MiB
    desc: 1x6 MiB
  Speed (MHz): avg: 900 min/max: 800/3300 base/boost: 2700/4200 scaling:
    driver: intel_pstate governor: powersave volts: 1.1 V ext-clock: 100 MHz
    cores: 1: 900 2: 900 3: 900 4: 900 bogomips: 21607
  Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_perfmon
    art avx avx2 bmi1 bmi2 bts clflush clflushopt cmov constant_tsc cpuid
    cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est
    f16c flexpriority flush_l1d fma fpu fsgsbase fxsr ht hwp hwp_act_window
    hwp_epp hwp_notify ibpb ibrs ida intel_pt invpcid invpcid_single lahf_lm
    lm mca mce md_clear mmx monitor movbe mpx msr mtrr nonstop_tsc nopl nx pae
    pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni popcnt pse pse36 pti
    pts rdrand rdseed rdtscp rep_good sdbg sep smap smep ss ssbd sse sse2
    sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_adjust
    tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave xsavec xsaveopt
    xsaves xtopology xtpr
  Vulnerabilities:
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf
    mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled
  Type: mds mitigation: Clear CPU buffers; SMT disabled
  Type: meltdown mitigation: PTI
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl and seccomp
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional,
    IBRS_FW, STIBP: disabled, RSB filling
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA GK208B [GeForce GT 730] vendor: Bitland Information
    driver: nvidia v: 470.94 alternate: nouveau,nvidia_drm bus-ID: 01:00.0
    chip-ID: 10de:1287 class-ID: 0300
  Display: x11 server: X.Org 1.21.1.3 compositor: xfwm4 v: 4.16.1 driver:
    loaded: nvidia display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 92 s-size: 530x301mm (20.9x11.9")
    s-diag: 610mm (24")
  Monitor-1: HDMI-0 res: 1920x1080 hz: 60 dpi: 93
    size: 527x296mm (20.7x11.7") diag: 604mm (23.8")
  Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:
  Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel bus-ID: 00:1f.3 chip-ID: 8086:a170
    class-ID: 0403
  Device-2: NVIDIA GK208 HDMI/DP Audio vendor: Bitland Information
    driver: snd_hda_intel v: kernel bus-ID: 01:00.1 chip-ID: 10de:0e0f
    class-ID: 0403
  Sound Server-1: ALSA v: k5.15.16-1-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.20 running: no
  Sound Server-3: PulseAudio v: 15.0 running: yes
  Sound Server-4: PipeWire v: 0.3.43 running: no
Network:
  Device-1: Intel Ethernet I219-LM vendor: Lenovo driver: e1000e v: kernel
    port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15b7 class-ID: 0200
  IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IP v4: <filter> type: dynamic noprefixroute scope: global
    broadcast: <filter>
  IP v6: <filter> type: noprefixroute scope: link
  Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
    vendor: Lenovo driver: ath10k_pci v: kernel bus-ID: 02:00.0
    chip-ID: 168c:003e class-ID: 0280
  IF: wlp2s0 state: down mac: <filter>
  WAN IP: <filter>
Bluetooth:
  Device-1: Qualcomm Atheros QCA61x4 Bluetooth 4.0 type: USB driver: btusb
    v: 0.8 bus-ID: 1-4:2 chip-ID: 0cf3:e300 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
    rfk-block: hardware: no software: yes address: see --recommends
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 931.51 GiB used: 93.14 GiB (10.0%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 870 EVO 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 2B6Q scheme: MBR
  Optical-1: /dev/sr0 vendor: MATSHITA model: DVD-RAM SW840 rev: V801
    dev-links: cdrom
  Features: speed: 40 multisession: yes audio: yes dvd: yes
    rw: cd-r,cd-rw,dvd-r,dvd-ram state: running
Partition:
  ID-1: / raw-size: 931.51 GiB size: 915.81 GiB (98.31%)
    used: 93.14 GiB (10.2%) fs: ext4 block-size: 4096 B dev: /dev/sda1
    maj-min: 8:1 label: N/A uuid: ba158eec-2ab7-4dff-b04d-a301777a1a16
Swap:
  Alert: No swap data was found.
Unmounted:
  Message: No unmounted partitions found.
USB:
  Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 16 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 1-4:2 info: Qualcomm Atheros QCA61x4 Bluetooth 4.0
    type: Bluetooth driver: btusb interfaces: 2 rev: 1.1 speed: 12 Mb/s
    power: 100mA chip-ID: 0cf3:e300 class-ID: e001
  Device-2: 1-5:4 info: Logitech Unifying Receiver type: Keyboard,Mouse
    driver: logitech-djreceiver,usbhid interfaces: 2 rev: 2.0 speed: 12 Mb/s
    power: 98mA chip-ID: 046d:c534 class-ID: 0301
  Hub-2: 2-0:1 info: Super-speed hub ports: 8 rev: 3.0 speed: 5 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
Sensors:
  System Temperatures: cpu: 29.8 C mobo: 27.8 C gpu: nvidia temp: 33 C
  Fan Speeds (RPM): N/A gpu: nvidia fan: 40%
Info:
  Processes: 196 Uptime: 2h 34m wakeups: 4 Init: systemd v: 250
  tool: systemctl Compilers: gcc: 11.1.0 clang: 13.0.0 Packages: pacman: 1115
  lib: 317 flatpak: 0 Shell: Bash (su) v: 5.1.16 running-in: xfce4-terminal
  inxi: 3.3.12

Next, I tried Googling my issue and I didn’t find anything helpful so I convinced myself that I must have done something wrong. I reformatted the drive and installed Manjaro, all of the updates, software, and settings all over again. After all of that… same thing.

Next, I tried different kernels. Nope. Didn’t help.

I read somewhere that a defective USB keyboard had caused similar issues for one user in another forum. I unplugged the friend’s keyboard from the back and plugged in one of my keyboards in the front (the front has 2 USB 3.0 ports, the back has both USB 2 and 3 ports- this is relevant in a minute, lol). OMG the CPU came back down to ZERO!! THE PROBLEM WAS SOLVED???

No. It wasn’t fixed at all.

On the next restart, it was back to that one CPU core pegged at 100%. Okay, maybe the front USB ports are bad or have something spilled in them. I disconnected them from the motherboard.

No change.

I looked in dmesg and didn’t see anything useful, except there was one line that said, “usb usb1-port14: over-current condition”

Maybe this computer just has a messed up motherboard. I got out my USB tester and checked all of the ports, both with no load and with a load from various USB devices. All voltages were perfect. I turned off the computer and came back later.

I started it up and IT WAS FINE!!! CPU 0% at idle. WTF???

I’m going to end the story here because the problem still exists BUT ONLY at certain times and this is what I hope one of you can help me figure out. I’ve done a BUNCH of tests and here is what I’ve figured out so far:

Lenovo, TEST SET 2

  • Manjaro OS, logi keyboard/mouse receiver in left front USB3 (ONLY USB device)
    • no CPU issues, 0% CPU at idle, 1% with htop running.

    • I’m gonna do a software RESTART instead of a shutdown and power on.

    • NOW, it’s showing ~25% CPU, htop shows core 3 pegged at 100%.

      • put Samsung thumb drive in right front USB 3, CPU goes to 1% at idle (htop is running).
      • removed thumb drive, htop shows core 2 pegged at 100%.
    • INTERESTING…

      • putting the thumb drive in the right front USB, immediately brings the CPU down to 1%.

      • putting the USB tester in WITH the thumb drive attached, DOES NOT bring the CPU down!!

      • I tried putting the tester in first, waiting a few seconds, then adding the thumb drive, NOPE!

      • Only plugging in the thumb drive directly, brings the CPU down.

      • I tested the thumb drive while it was plugged into the tester and I could read and write to it!

        • What’s the difference between the thumb drive being plugged in directly and through the tester??
        • the voltage and amps shown on the tester is always the same, 5.05v, 0.04A.
      • I repeated these experiments a dozen times:

        • take the thumb drive out, one of the CPU cores pegs at 100%.
        • put the thumb drive back in, CPU settles back to 1%.
      • Rear USB ports:

        • putting the thumb drive into either of the USB 2 ports DOES NOT HELP!
        • putting the thumb drive into EITHER of the USB 3 ports brings the CPU down to 1%.
        • using the tester with the thumb drive plugged into it, DOES NOT HELP (same as front port)
        • using the tester in a DIFFERENT port does not affect the experiment.
          • the tester shows no change in voltage for either 100% or 1% events.
    • fully shut down the computer for the next test set

Lenovo, TEST SET 3

  • Manjaro OS, logi keyboard/mouse receiver in left front USB3 (ONLY USB device)
    • 0% CPU at idle, as I predicted.

    • What’s the difference between software restart and shutdown/manual power on??

    • I did 10 cycles of restart using whisker menu, or a terminal “shutdown -r now”, mixed with shutting down completely, then turning the machine back on with the power button.

      • software restarts always result in 1 CPU core pegged at 100%
      • full power cycles always result in normal operation!
    • When 1 CPU core is pegged at 100%, if I remove the keyboard/mouse receiver, A SECOND core will go to 100%

      • as soon as I plug the receiver back in, the second CPU core comes right back down to 0.
      • I can plug the receiver back into a DIFFERENT port, as long as its a USB 3 port.
    • When the computer is running “normally” (after a full power cycle) and no cores are pegged at 100%, I can unplug the keyboard/mouse receiver as many times as I want and it causes no issues.

Tell me that’s not weird? Could this be a kernel bug? Something else? I can’t begin to guess what’s causing this behavior.

Thanks

Well, I saw there were some new updates earlier today and after installing them, I no longer see the issue. I’ll be returning the machine to its owner and I’ll post again if this issue reappears.

I can’t really mark this solved because I don’t know what fixed it. If anyone has any ideas, I’d love to hear them. I want this thread to help others who might encounter a similar problem.

Finally, I’ve noticed a couple other people having problems with Manjaro on Lenovo machines, both desktops and laptops. Is there something Lenovo does differently that makes things more difficult for Manjaro users?