Hi, I’m posting to see if I can get any ideas. I’ve spent many hours doing experiments that I will document below. Some of it is kind of weird and I haven’t really seen anything like it. But, I’m certainly no expert and maybe I missed something obvious?
TLDR;
- If I cold boot, the computer works fine. None of these issues.
- If I restart, either with the whisker menu or “shutdown -r now”, the computer will have 1 or more CPU cores pegged at 100% from a kworker process when it reloads.
- If I plug in an empty USB thumb drive, CPU cores go back to normal but ONLY if I use a USB 3.0 port. If I remove the USB keyboard receiver, another CPU core will go to 100%. If I plug it back in, it goes back to zero.
- I’ve reinstalled Manjaro several times, tried 3 different kernels, updated the BIOS, and tested all of the USB port volts/amps.
Background
This is a friend’s computer. It was running Windows 10 and was so slow, it would take 10 seconds just to get a right-click context menu. It was painful to watch. I convinced her to let me swap out the hard drive with an SSD and install Manjaro. This person uses their computer for Gmail, surfing the web, and sometimes writing a Word document. Over the past couple of years, I’ve installed Manjaro on a couple dozen machines and everyone is happy so far!
This is my first REALLY weird experience and I hope you don’t mind the story I’m about to tell but I think it’s somewhat necessary and may help some poor soul who has a similar issue in the future…
The Problem
I had just finished installing Manjaro, Chrome, some fonts, and a small list of other things. Most people still want access to Windows for some reason, so I install VirtualBox and I use the Windows Key they already own.
Nothing was running but conky was showing 25% CPU use. Huh? I opened htop and it showed one of the CPU cores pegged at 100%. I started this project several days ago but here’s a screenshot from just now because I’ve learned how to replicate it…
Before I continue my saga, here is the machine info:
Wed Feb 2 03:48:22 AM EST 2022
# inxi --admin --verbosity=7 --filter --width
System:
Kernel: 5.15.16-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
parameters: BOOT_IMAGE=/boot/vmlinuz-5.15-x86_64
root=UUID=ba158eec-2ab7-4dff-b04d-a301777a1a16 rw quiet apparmor=1
security=apparmor udev.log_priority=3
Desktop: Xfce 4.16.0 tk: Gtk 3.24.29 info: xfce4-panel wm: xfwm 4.16.1
vt: 7 dm: LightDM 1.30.0 Distro: Manjaro Linux base: Arch Linux
Machine:
Type: Desktop System: LENOVO product: 90ED0009US v: ideacentre 700-25ISH
serial: <filter> Chassis: type: 3 serial: <filter>
Mobo: LENOVO model: SKYBAY v: SDK0J40700 WIN 3258033222747 serial: N/A
UEFI-[Legacy]: LENOVO v: FWKT47A date: 04/20/2016
Battery:
Device-1: hidpp_battery_1 model: Logitech Wireless Keyboard serial: <filter>
charge: 55% (should be ignored) rechargeable: yes status: Discharging
Memory:
RAM: total: 7.71 GiB used: 977.5 MiB (12.4%)
Array-1: capacity: 64 GiB slots: 4 EC: None max-module-size: 16 GiB
note: est.
Device-1: ChannelA-DIMM0 size: No Module Installed
Device-2: ChannelA-DIMM1 size: 8 GiB speed: 2133 MT/s type: DDR4
detail: synchronous bus-width: 64 bits total: 64 bits manufacturer: SK Hynix
part-no: HMA41GU6AFR8N-TF serial: <filter>
Device-3: ChannelB-DIMM0 size: No Module Installed
Device-4: ChannelB-DIMM1 size: No Module Installed
CPU:
Info: model: Intel Core i5-6400 socket: U3E1 bits: 64 type: MCP
arch: Skylake-S family: 6 model-id: 0x5E (94) stepping: 3 microcode: 0xEA
Topology: cpus: 1x cores: 4 smt: <unsupported> cache: L1: 256 KiB
desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB L3: 6 MiB
desc: 1x6 MiB
Speed (MHz): avg: 900 min/max: 800/3300 base/boost: 2700/4200 scaling:
driver: intel_pstate governor: powersave volts: 1.1 V ext-clock: 100 MHz
cores: 1: 900 2: 900 3: 900 4: 900 bogomips: 21607
Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_perfmon
art avx avx2 bmi1 bmi2 bts clflush clflushopt cmov constant_tsc cpuid
cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est
f16c flexpriority flush_l1d fma fpu fsgsbase fxsr ht hwp hwp_act_window
hwp_epp hwp_notify ibpb ibrs ida intel_pt invpcid invpcid_single lahf_lm
lm mca mce md_clear mmx monitor movbe mpx msr mtrr nonstop_tsc nopl nx pae
pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni popcnt pse pse36 pti
pts rdrand rdseed rdtscp rep_good sdbg sep smap smep ss ssbd sse sse2
sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_adjust
tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave xsavec xsaveopt
xsaves xtopology xtpr
Vulnerabilities:
Type: itlb_multihit status: KVM: VMX disabled
Type: l1tf
mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled
Type: mds mitigation: Clear CPU buffers; SMT disabled
Type: meltdown mitigation: PTI
Type: spec_store_bypass
mitigation: Speculative Store Bypass disabled via prctl and seccomp
Type: spectre_v1
mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional,
IBRS_FW, STIBP: disabled, RSB filling
Type: srbds mitigation: Microcode
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: NVIDIA GK208B [GeForce GT 730] vendor: Bitland Information
driver: nvidia v: 470.94 alternate: nouveau,nvidia_drm bus-ID: 01:00.0
chip-ID: 10de:1287 class-ID: 0300
Display: x11 server: X.Org 1.21.1.3 compositor: xfwm4 v: 4.16.1 driver:
loaded: nvidia display-ID: :0.0 screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 92 s-size: 530x301mm (20.9x11.9")
s-diag: 610mm (24")
Monitor-1: HDMI-0 res: 1920x1080 hz: 60 dpi: 93
size: 527x296mm (20.7x11.7") diag: 604mm (23.8")
Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:
Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Lenovo
driver: snd_hda_intel v: kernel bus-ID: 00:1f.3 chip-ID: 8086:a170
class-ID: 0403
Device-2: NVIDIA GK208 HDMI/DP Audio vendor: Bitland Information
driver: snd_hda_intel v: kernel bus-ID: 01:00.1 chip-ID: 10de:0e0f
class-ID: 0403
Sound Server-1: ALSA v: k5.15.16-1-MANJARO running: yes
Sound Server-2: JACK v: 1.9.20 running: no
Sound Server-3: PulseAudio v: 15.0 running: yes
Sound Server-4: PipeWire v: 0.3.43 running: no
Network:
Device-1: Intel Ethernet I219-LM vendor: Lenovo driver: e1000e v: kernel
port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15b7 class-ID: 0200
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
IP v4: <filter> type: dynamic noprefixroute scope: global
broadcast: <filter>
IP v6: <filter> type: noprefixroute scope: link
Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
vendor: Lenovo driver: ath10k_pci v: kernel bus-ID: 02:00.0
chip-ID: 168c:003e class-ID: 0280
IF: wlp2s0 state: down mac: <filter>
WAN IP: <filter>
Bluetooth:
Device-1: Qualcomm Atheros QCA61x4 Bluetooth 4.0 type: USB driver: btusb
v: 0.8 bus-ID: 1-4:2 chip-ID: 0cf3:e300 class-ID: e001
Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
rfk-block: hardware: no software: yes address: see --recommends
Logical:
Message: No logical block device data found.
RAID:
Message: No RAID data found.
Drives:
Local Storage: total: 931.51 GiB used: 93.14 GiB (10.0%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 870 EVO 1TB
size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
type: SSD serial: <filter> rev: 2B6Q scheme: MBR
Optical-1: /dev/sr0 vendor: ■■■■■■■■ model: DVD-RAM SW840 rev: V801
dev-links: cdrom
Features: speed: 40 multisession: yes audio: yes dvd: yes
rw: cd-r,cd-rw,dvd-r,dvd-ram state: running
Partition:
ID-1: / raw-size: 931.51 GiB size: 915.81 GiB (98.31%)
used: 93.14 GiB (10.2%) fs: ext4 block-size: 4096 B dev: /dev/sda1
maj-min: 8:1 label: N/A uuid: ba158eec-2ab7-4dff-b04d-a301777a1a16
Swap:
Alert: No swap data was found.
Unmounted:
Message: No unmounted partitions found.
USB:
Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 16 rev: 2.0
speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
Device-1: 1-4:2 info: Qualcomm Atheros QCA61x4 Bluetooth 4.0
type: Bluetooth driver: btusb interfaces: 2 rev: 1.1 speed: 12 Mb/s
power: 100mA chip-ID: 0cf3:e300 class-ID: e001
Device-2: 1-5:4 info: Logitech Unifying Receiver type: Keyboard,Mouse
driver: logitech-djreceiver,usbhid interfaces: 2 rev: 2.0 speed: 12 Mb/s
power: 98mA chip-ID: 046d:c534 class-ID: 0301
Hub-2: 2-0:1 info: Super-speed hub ports: 8 rev: 3.0 speed: 5 Gb/s
chip-ID: 1d6b:0003 class-ID: 0900
Sensors:
System Temperatures: cpu: 29.8 C mobo: 27.8 C gpu: nvidia temp: 33 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 40%
Info:
Processes: 196 Uptime: 2h 34m wakeups: 4 Init: systemd v: 250
tool: systemctl Compilers: gcc: 11.1.0 clang: 13.0.0 Packages: pacman: 1115
lib: 317 flatpak: 0 Shell: Bash (su) v: 5.1.16 running-in: xfce4-terminal
inxi: 3.3.12
Next, I tried Googling my issue and I didn’t find anything helpful so I convinced myself that I must have done something wrong. I reformatted the drive and installed Manjaro, all of the updates, software, and settings all over again. After all of that… same thing.
Next, I tried different kernels. Nope. Didn’t help.
I read somewhere that a defective USB keyboard had caused similar issues for one user in another forum. I unplugged the friend’s keyboard from the back and plugged in one of my keyboards in the front (the front has 2 USB 3.0 ports, the back has both USB 2 and 3 ports- this is relevant in a minute, lol). OMG the CPU came back down to ZERO!! THE PROBLEM WAS SOLVED???
No. It wasn’t fixed at all.
On the next restart, it was back to that one CPU core pegged at 100%. Okay, maybe the front USB ports are bad or have something spilled in them. I disconnected them from the motherboard.
No change.
I looked in dmesg and didn’t see anything useful, except there was one line that said, “usb usb1-port14: over-current condition”
Maybe this computer just has a messed up motherboard. I got out my USB tester and checked all of the ports, both with no load and with a load from various USB devices. All voltages were perfect. I turned off the computer and came back later.
I started it up and IT WAS FINE!!! CPU 0% at idle. WTF???
I’m going to end the story here because the problem still exists BUT ONLY at certain times and this is what I hope one of you can help me figure out. I’ve done a BUNCH of tests and here is what I’ve figured out so far:
Lenovo, TEST SET 2
- Manjaro OS, logi keyboard/mouse receiver in left front USB3 (ONLY USB device)
no CPU issues, 0% CPU at idle, 1% with htop running.
I’m gonna do a software RESTART instead of a shutdown and power on.
NOW, it’s showing ~25% CPU, htop shows core 3 pegged at 100%.
- put Samsung thumb drive in right front USB 3, CPU goes to 1% at idle (htop is running).
- removed thumb drive, htop shows core 2 pegged at 100%.
INTERESTING…
putting the thumb drive in the right front USB, immediately brings the CPU down to 1%.
putting the USB tester in WITH the thumb drive attached, DOES NOT bring the CPU down!!
I tried putting the tester in first, waiting a few seconds, then adding the thumb drive, NOPE!
Only plugging in the thumb drive directly, brings the CPU down.
I tested the thumb drive while it was plugged into the tester and I could read and write to it!
- What’s the difference between the thumb drive being plugged in directly and through the tester??
- the voltage and amps shown on the tester is always the same, 5.05v, 0.04A.
I repeated these experiments a dozen times:
- take the thumb drive out, one of the CPU cores pegs at 100%.
- put the thumb drive back in, CPU settles back to 1%.
Rear USB ports:
- putting the thumb drive into either of the USB 2 ports DOES NOT HELP!
- putting the thumb drive into EITHER of the USB 3 ports brings the CPU down to 1%.
- using the tester with the thumb drive plugged into it, DOES NOT HELP (same as front port)
- using the tester in a DIFFERENT port does not affect the experiment.
- the tester shows no change in voltage for either 100% or 1% events.
fully shut down the computer for the next test set
Lenovo, TEST SET 3
- Manjaro OS, logi keyboard/mouse receiver in left front USB3 (ONLY USB device)
0% CPU at idle, as I predicted.
What’s the difference between software restart and shutdown/manual power on??
I did 10 cycles of restart using whisker menu, or a terminal “shutdown -r now”, mixed with shutting down completely, then turning the machine back on with the power button.
- software restarts always result in 1 CPU core pegged at 100%
- full power cycles always result in normal operation!
When 1 CPU core is pegged at 100%, if I remove the keyboard/mouse receiver, A SECOND core will go to 100%
- as soon as I plug the receiver back in, the second CPU core comes right back down to 0.
- I can plug the receiver back into a DIFFERENT port, as long as its a USB 3 port.
When the computer is running “normally” (after a full power cycle) and no cores are pegged at 100%, I can unplug the keyboard/mouse receiver as many times as I want and it causes no issues.
Tell me that’s not weird? Could this be a kernel bug? Something else? I can’t begin to guess what’s causing this behavior.
Thanks