CPU cores overload and get stuck at 100% one by one when cloning a big git repo

his_dudeness · 21 June 2023 14:48

Wow man thanks for the commitment to solving the issue!

I have reinstalled manjaro 22.1.3 and before doing anything I cloned the repo. And… it’s working fine. Currently the download has finished and it’s now processing the files. I believe that something I did between first boot and git clone was causing my issues. Here’s a list of what I did:

Enabled AUR at /etc/pamac.conf.
Installed editors:

sudo pamac install vscodium-bin
sudo pamac install vscodium-bin-marketplace
sudo pamac install sublime-text-4

Updated pacman:

sudo pacman -Syyu

Installed pacman package

sudo pacman -S diffuse

Changed manjaro theme to Breeze Dark.
Opened firefox and downloaded all extensions I use.
Got user token from github.
Cloned the repo.

May be a little too thorough but you never know what little things can cause huge issues.

So I’m gonna snapshot the partition and start making some tests. The curiosity is getting the best of me.

If you think you know which step bricks manjaro let me know, otherwise I’ll let you know when I find.

linux-aarhus · 21 June 2023 15:02

My test pulled the above 7 ISO without any hickups.

Screenshot

And you managed to clone the repo - fantastic -

I did run into some issues but those is entirely credited my lack of experience and knowledge around qemu/kvm and virt-manager.

In the end the tests did not reveal any issues with Manjaro as such - really not expected either - but equally good to confirm.

My host also runs Manjaro - not a surprise really - but worth mentioning

his_dudeness · 21 June 2023 15:59

So I just tried again without changing anything, just booted and cloned the repo again, in the off chance that the previous successful cloning was just a one off. And… the problems returned. So it must not be any of the actions between first boot and repo cloning.

I have no idea what is the problem and I don’t even know where to start looking. I’m either going to restore the snapshot with the repo intact and use it like that and hope nothing like this happen in the future or I’m just gonna install arch and get this over with.

I’m beat.

Thank you very much for the help @linux-aarhus and @Aragorn.

linux-aarhus · 22 June 2023 06:05

Jugding anything by the result of cloning a 25GB repo is - in my opnion edgy.

Whether you use Arch or Manjaro - the same thing can happen - as you have a working clone - you shouldn’t have the need to clone again.

I think what happens is a combination of different factors.

One factor is the size of the repo - it is huge - I have a hard time imagining what could create a repo of that size - perhaps the lack of swap is making your system choke on the size - remember /tmp is allocated from RAM.

Another factor is the hosting of such repo - a repo of that size is likely selfhosted - and the database behind may need maintenance. Also - from experience - gitea is great - I have setup an instance at the company server (Win2019) - to host the code I work on.

As you have no swap - you need to setup swap, secondly I suggest you see if tweaking zswap - ArchWiki will bring any chnage.

his_dudeness · 22 June 2023 19:05

New developments!

I’ve tried the same workflow with swap (16GB) and with xfce (separately) and both failed the same way as before: Install, boot, clone, reboot, clone again, error.

What I noticed was that all the times it worked was on the first boot. I tried booting after an install and then instantly rebooting and wasn’t even able to get it to clone once. I installed again and have cloned the repo 3 times in the first boot without any issues. So it seems that something happens when manjaro is shutting off that compromises subsequent boots. May be something in manjaro itself or some VM configuration.

Any guesses?

I’m hopeful again lol

linux-aarhus · 23 June 2023 05:33

I have been doing some experimenting also.

My goal is to be able to replicate your issue - not much success - I admit. My vm has been rebooted several times and is also updated to

I have been thinking - what repo could possibly be a challenge - then it struck me that the kernel sources could be a worthy test.

The linux-source repo clone went without any issues.

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Cloning into 'linux-stable'...
remote: Enumerating objects: 11487511, done.
remote: Counting objects: 100% (3953/3953), done.
remote: Compressing objects: 100% (2263/2263), done.
remote: Total 11487511 (delta 2912), reused 2100 (delta 1686), pack-reused 11483558
Receiving objects: 100% (11487511/11487511), 4.47 GiB | 13.47 MiB/s, done.
Resolving deltas: 100% (9170170/9170170), done.
Updating files: 100% (80340/80340), done.

So I considered another option - android sources is also a huge repo - let’s see how that goes.

While going over the steps to possibly test cloning android sources - I saw this note on network - I wonder if that could be part of your issue?

More rarely, Linux clients experience connectivity issues, getting stuck in the middle of downloads (typically during receiving objects). Adjusting the settings of the TCP/IP stack and using non-parallel commands can improve the situation. You must have root access to modify the TCP setting:
sudo sysctl -w net.ipv4.tcp_window_scaling=0
– Downloading the Source | Android Open Source Project

I am in the process of creating a local mirror of android by following the instructions found using the above link. I am creating this inside the virtual machine I created yesterday (I did build and install the custom bochs package from AUR). I have not tweaked my net settings as it is rarely necessary.

inxi -F @ http://ix.io/4ySQ

Test is still running - I think I run out of diskspace before it is done - at the time of writing it has been running for more than 50 minutes and pulled more than 39GiB.

Screenshot

his_dudeness · 23 June 2023 15:58

The problem happens before any network issues. Actually, the processor overload causes the network issues because it causes every application to grind to a halt.

I didn’t mention before but xfce’s problem is different. It doesnt brick the system completely. The cores still get overloaded but I can open other apps. But when I reboot, it undoes the cloning fragments, just like on plasma.

linux-aarhus · 23 June 2023 16:11

I did run out of diskspace - I had 70G - as I have not been able to reproduce - I have no idea what is causing your issue - I am leaning towards something local for your system.

I noted that your inxi for your vm was very different from mine.

Because of this major difference it must be local but as to what - I am clueless …

Your inxi -Fazy

his_dudeness:

$ sudo inxi -Fazy
  System:
    Kernel: 6.1.30-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 12.2.1
      parameters: BOOT_IMAGE=/boot/vmlinuz-6.1-x86_64
      root=UUID=74c20d02-d0eb-47b1-bdff-46aeca800c85 rw quiet splash
      udev.log_priority=3
    Desktop: KDE Plasma v: 5.27.4 tk: Qt v: 5.15.9 wm: kwin_x11 dm: SDDM
      Distro: Manjaro Linux base: Arch Linux
  Machine:
    Type: Vm-other System: Dell product: XPS 8500 v: pc-q35-5.2 serial: N/A
      Chassis: QEMU type: 1 v: pc-q35-5.2 serial: N/A
    Mobo: N/A model: N/A serial: N/A UEFI: Dell v: Default System date: N/A
  CPU:
    Info: model: Intel Core i9-10900K bits: 64 type: MT MCP arch: Comet Lake
      gen: core 10 level: v3 note: check built: 2020 process: Intel 14nm family: 6
      model-id: 0xA5 (165) stepping: 5 microcode: 0xF6
    Topology: cpus: 1x cores: 10 tpc: 2 threads: 20 smt: enabled cache:
      L1: 1.2 MiB desc: d-20x32 KiB; i-20x32 KiB L2: 40 MiB desc: 10x4 MiB
      L3: 16 MiB desc: 1x16 MiB
    Speed (MHz): avg: 3696 min/max: N/A base/boost: 2000/2000 cores: 1: 3696
      2: 3696 3: 3696 4: 3696 5: 3696 6: 3696 7: 3696 8: 3696 9: 3696 10: 3696
      11: 3696 12: 3696 13: 3696 14: 3696 15: 3696 16: 3696 17: 3696 18: 3696
      19: 3696 20: 3696 bogomips: 147896
    Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
    Vulnerabilities:
    Type: itlb_multihit status: Not affected
    Type: l1tf status: Not affected
    Type: mds status: Not affected
    Type: meltdown status: Not affected
    Type: mmio_stale_data status: Vulnerable: Clear CPU buffers attempted, no
      microcode; SMT Host state unknown
    Type: retbleed mitigation: Enhanced IBRS
    Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
      prctl
    Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
      sanitization
    Type: spectre_v2 mitigation: Enhanced IBRS, IBPB: conditional, RSB
      filling, PBRSB-eIBRS: SW sequence
    Type: srbds status: Unknown: Dependent on hypervisor status
    Type: tsx_async_abort status: Not affected
  Graphics:
    Device-1: NVIDIA TU104 [GeForce RTX 2080 SUPER] vendor: ASUSTeK
      driver: nvidia v: 530.41.03 alternate: nouveau,nvidia_drm non-free: 530.xx+
      status: current (as of 2023-05) arch: Turing code: TUxxx
      process: TSMC 12nm FF built: 2018-22 pcie: gen: 3 speed: 8 GT/s lanes: 16
      bus-ID: 04:00.0 chip-ID: 10de:1e81 class-ID: 0300
    Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.1
      compositor: kwin_x11 driver: X: loaded: nvidia gpu: nvidia display-ID: :0
      screens: 1
    Screen-1: 0 s-res: 3840x1080 s-dpi: 81 s-size: 1204x343mm (47.40x13.50")
      s-diag: 1252mm (49.29") monitors: <missing: xrandr>
    API: OpenGL v: 4.6.0 NVIDIA 530.41.03 renderer: NVIDIA GeForce RTX 2080
      SUPER/PCIe/SSE2 direct-render: Yes
  Audio:
    Device-1: NVIDIA TU104 HD Audio vendor: ASUSTeK driver: snd_hda_intel
      v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 05:00.0
      chip-ID: 10de:10f8 class-ID: 0403
    API: ALSA v: k6.1.30-1-MANJARO status: kernel-api with: aoss
      type: oss-emulator tools: alsactl,alsamixer,amixer
    Server-1: JACK v: 1.9.22 status: off tools: N/A
    Server-2: PipeWire v: 0.3.70 status: n/a (root, process) with: wireplumber
      status: active tools: pw-cli,wpctl
    Server-3: PulseAudio v: 16.1 status: active (root, process)
      with: pulseaudio-alsa type: plugin tools: pacat,pactl
  Network:
    Device-1: Intel 82574L Gigabit Network driver: e1000e v: kernel pcie: gen: 1
      speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 01:00.0 chip-ID: 8086:10d3
      class-ID: 0200
    IF: enp1s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Drives:
    Local Storage: total: 2.55 TiB used: 37.43 GiB (1.4%)
    ID-1: /dev/sda maj-min: 8:0 vendor: Seagate model: Expansion size: 1.82 TiB
      block-size: physical: 4096 B logical: 512 B type: USB rev: 2.1 spd: 480 Mb/s
      lanes: 1 mode: 2.0 tech: N/A serial: <filter> fw-rev: 0712 scheme: GPT
    SMART Message: A mandatory SMART command failed. Various possible causes.
    ID-2: /dev/vda maj-min: 254:0 model: N/A size: 750 GiB block-size:
      physical: 512 B logical: 512 B tech: N/A serial: N/A scheme: GPT
    SMART Message: Unknown smartctl error. Unable to generate data.
  Partition:
    ID-1: / raw-size: 749.7 GiB size: 736.87 GiB (98.29%) used: 37.43 GiB (5.1%)
      fs: ext4 block-size: 4096 B dev: /dev/vda2 maj-min: 254:2
    ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
      used: 288 KiB (0.1%) fs: vfat block-size: 512 B dev: /dev/vda1 maj-min: 254:1
  Swap:
    Alert: No swap data was found.
  Sensors:
    System Temperatures: cpu: N/A mobo: N/A gpu: nvidia temp: 36 C
    Fan Speeds (RPM): N/A gpu: nvidia fan: 0%
  Info:
    Processes: 328 Uptime: 10m wakeups: 0 Memory: available: 15.25 GiB
    used: 1.54 GiB (10.1%) Init: systemd v: 252 default: graphical
    tool: systemctl Compilers: gcc: 12.2.1 clang: 15.0.7 Packages: pm: pacman
    pkgs: 1126 libs: 323 tools: pamac pm: flatpak pkgs: 0 Shell: Zsh (sudo)
    v: 5.9 default: Bash v: 5.1.16 running-in: konsole inxi: 3.3.27

My inxi -F

System:
  Host: mjro-qemu Kernel: 6.1.35-1-MANJARO arch: x86_64 bits: 64
    Desktop: KDE Plasma v: 5.27.6 Distro: Manjaro Linux
Machine:
  Type: Kvm System: QEMU product: Standard PC (Q35 + ICH9, 2009) v: pc-q35-8.0
    serial: <superuser required>
  Mobo: N/A model: N/A serial: N/A UEFI: EDK II v: N/A date: 2/2/2022
CPU:
  Info: 8x 1-core model: AMD Ryzen Threadripper PRO 5945WX s bits: 64
    type: SMP cache: L2: 8x 512 KiB (4 MiB)
  Speed (MHz): avg: 4092 min/max: N/A cores: 1: 4092 2: 4092 3: 4092 4: 4092
    5: 4092 6: 4092 7: 4092 8: 4092
Graphics:
  Device-1: Red Hat Virtio 1.0 GPU driver: virtio-pci v: 1
  Display: x11 server: X.org v: 1.21.1.8 driver: X: loaded: modesetting
    dri: virtio_gpu gpu: virtio-pci resolution: 1920x1080~60Hz
  API: OpenGL Message: Unable to show GL data. Required tool glxinfo
    missing.
Audio:
  Device-1: Intel 82801I HD Audio driver: snd_hda_intel
  API: ALSA v: k6.1.35-1-MANJARO status: kernel-api
Network:
  Device-1: Red Hat Virtio 1.0 network driver: virtio-pci
  IF-ID-1: enp1s0 state: up speed: -1 duplex: unknown mac: 52:54:00:d1:56:2f
Drives:
  Local Storage: total: 70 GiB used: 50.63 GiB (72.3%)
  ID-1: /dev/vda model: N/A size: 70 GiB
Partition:
  ID-1: / size: 68.05 GiB used: 50.63 GiB (74.4%) fs: ext4 dev: /dev/vda2
  ID-2: /boot/efi size: 299.4 MiB used: 288 KiB (0.1%) fs: vfat
    dev: /dev/vda1
Swap:
  ID-1: swap-1 type: file size: 512 MiB used: 51.4 MiB (10.0%) file: /swapfile
Sensors:
  Src: lm-sensors+/sys Message: No sensor data found using /sys/class/hwmon
    or lm-sensors.
Info:
  Processes: 213 Uptime: 1h 12m Memory: available: 15.6 GiB
  used: 1.85 GiB (11.8%) Shell: Bash inxi: 3.3.27

his_dudeness · 23 June 2023 16:19

Are you passsing through an nvidia card? Could it be nvidia related?

linux-aarhus · 23 June 2023 16:34

I am not pasning anything. It is all virtualized.

I do not own nvidia so I cannot say.

Quite possible