System freezes on merge git branch

Tygalive · 27 March 2023 10:41

Merging a branch using github desktop (at least 200 commits) will have the system freeze for at least an hour. Even the clock will not update during that time or will the system respond to any mouse movements, even trying to enter tty is a fruitless exercise until it’s done writing to disk. The disk indicator will flash throughout this time to show disk usage. I have 32 gb of ram, and 8 cores and an nvme ssd for storage, imagine how productive waiting for at least an hour is, for the system to respond to just a git merge.

inxi -Fx

Host: richard-hp Kernel: 6.1.19-1-MANJARO arch: x86_64 bits: 64
    compiler: gcc v: 12.2.1 Desktop: KDE Plasma v: 5.26.5 Distro: Manjaro Linux
    base: Arch Linux
Machine:
  Type: Laptop System: HP product: HP Laptop 15-dy2xxx v: N/A
    serial: <superuser required>
  Mobo: HP model: 87FE v: 57.20 serial: <superuser required> UEFI: AMI
    v: F.21 date: 03/21/2022
Battery:
  ID-1: BAT0 charge: 41.8 Wh (100.0%) condition: 41.8/41.0 Wh (101.9%)
    volts: 12.9 min: 11.4 model: HP Primary status: full
CPU:
  Info: quad core model: 11th Gen Intel Core i7-1165G7 bits: 64 type: MT MCP
    arch: Tiger Lake rev: 1 cache: L1: 320 KiB L2: 5 MiB L3: 12 MiB
  Speed (MHz): avg: 970 high: 1085 min/max: 400/4700 cores: 1: 1085 2: 973
    3: 1081 4: 1033 5: 1072 6: 930 7: 949 8: 643 bogomips: 44864
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] vendor: Hewlett-Packard
    driver: i915 v: kernel arch: Gen-12.1 bus-ID: 0000:00:02.0
  Device-2: Chicony HP TrueVision HD Camera type: USB driver: uvcvideo
    bus-ID: 1-3:3
  Display: x11 server: X.Org v: 21.1.7 driver: X: loaded: modesetting
    dri: iris gpu: i915 resolution: 1: 1920x1080~60Hz 2: 1920x1080~60Hz
  API: OpenGL v: 4.6 Mesa 22.3.5 renderer: Mesa Intel Xe Graphics (TGL GT2)
    direct-render: Yes
Audio:
  Device-1: Intel Tiger Lake-LP Smart Sound Audio vendor: Hewlett-Packard
    driver: sof-audio-pci-intel-tgl bus-ID: 0000:00:1f.3
  Sound API: ALSA v: k6.1.19-1-MANJARO running: yes
  Sound Server-1: JACK v: 1.9.22 running: no
  Sound Server-2: PulseAudio v: 16.1 running: yes
  Sound Server-3: PipeWire v: 0.3.65 running: yes
Network:
  Device-1: Realtek RTL8821CE 802.11ac PCIe Wireless Network Adapter
    vendor: Hewlett-Packard driver: rtw_8821ce v: N/A port: 3000
    bus-ID: 0000:01:00.0
  IF: wlo1 state: up mac: 4c:d5:77:64:20:bf
  Device-2: Realtek RTL8152 Fast Ethernet Adapter type: USB driver: r8152
    bus-ID: 1-4.2:6
  IF: enp0s20f0u4u2 state: down mac: 00:e0:4c:36:06:f1
RAID:
  Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
    v: 0.6 bus-ID: 0000:00:0e.0
Drives:
  Local Storage: total: 3.2 TiB used: 421.36 GiB (12.9%)
  ID-1: /dev/nvme0n1 vendor: Micron model: MTFDHBA512QFD-1AX1AABHA
    size: 476.94 GiB temp: 43.9 C
  ID-2: /dev/sda type: USB vendor: Western Digital model: WD20SPZX-22UA7T0
    size: 1.82 TiB
  ID-3: /dev/sdb type: USB vendor: Seagate model: ST1000LM048-2E7172
    size: 931.51 GiB
  ID-4: /dev/sdc type: USB vendor: Generic model: STORAGE size: 7.4 GiB
Partition:
  ID-1: / size: 442.54 GiB used: 421.36 GiB (95.2%) fs: btrfs
    dev: /dev/nvme0n1p2
  ID-2: /boot/efi size: 299.4 MiB used: 608 KiB (0.2%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-3: /home size: 442.54 GiB used: 421.36 GiB (95.2%) fs: btrfs
    dev: /dev/nvme0n1p2
  ID-4: /var/log size: 442.54 GiB used: 421.36 GiB (95.2%) fs: btrfs
    dev: /dev/nvme0n1p2
Swap:
  ID-1: swap-1 type: partition size: 34.1 GiB used: 0 KiB (0.0%)
    dev: /dev/nvme0n1p3
Sensors:
  System Temperatures: cpu: 61.0 C mobo: N/A
  Fan Speeds (RPM): cpu: 0 fan-2: 0
Info:
  Processes: 332 Uptime: 17m Memory: 31 GiB used: 4.78 GiB (15.4%)
  Init: systemd Compilers: gcc: 12.2.1 clang: 15.0.7 Packages: 1558 Shell: Zsh
  v: 5.9 inxi: 3.3.25

Note: This is from when the system was still responding

freggel.doe · 27 March 2023 10:55

Sounds like a disk problem to me.
Did you check your journal for any error messages?
Are there any indications of problems in SMART results?

Also, that btrfs filesystem is almost filled ( 95.2% usage)…

Tygalive · 27 March 2023 11:01

Right now i am not able to check the smart status because it’s currently frozen. The disk is relatively new, less than 5 months old, came with the laptop. This usually happens if there is high disk writing. If i move the mouse it will respond after a long time, just a jump to a new position, in this time the fans would be frantically spinning. No way to recover the session besides a hard shutdown, which i am afraid will result in a loss of files.

I still have around 20gb on the btrfs, could the disk filling up be the issue?

megavolt · 27 March 2023 11:06

Correct. under 5% free space is bad for btrfs. It should have at least 10-15% to breath. Using compression (zstd:9) would be also not a bad idea, when you use a lot small textfiles which is common on git repos; it would improve I/O a lot.

To add here: a large merge needs a lot of I/O of small files. When you are going to do a large merge, it would be better to do it in a tmpfs, so in the RAM, which speeds it up.

Tygalive · 27 March 2023 11:19

Thank you, is there a way to not have one app freeze the whole system. I have noticed also copying files using dolphin will have the system somewhat less responsive. How do i go about ensuring this doesn’t happen again?

Had to do a hard shutdown as i realised my day was going to be wasted waiting for the system to respond. Cleared a bit more memory to make about 32 gb of free space. The merge reports being completed an hour ago though the disk usage indicator was still flashing. Also SMART does not report any issues with the drive.

varikonniemi · 27 March 2023 11:21

Make sure you use BFQ scheduler. Check SDA with this command, modify to your disk identifier.

cat /sys/class/block/sda/queue/scheduler

If your problem is the scheduler, it would speaks for your disk working more like a memory card/usb stick than a SSD/NVME

Tygalive · 27 March 2023 11:25

Gives the output
[mq-deadline] kyber bfq none

varikonniemi · 27 March 2023 11:25

So try setting it to bfq. And i hope you changed sda to your nvme drive identifier? Otherwise you are looking at your first ssd/spinning disk

Tygalive · 27 March 2023 11:27

Thank you, turns out the output was for a different drive. The actual out put Is [none] mq-deadline kyber bfq

cat /sys/class/block/nvme0n1/queue/scheduler

megavolt · 27 March 2023 11:32

Commonly that shouldn’t happen, but in your use case it would be a better approach to do intensive I/O not on the same drive ( nvme0) as your root directory (so another drive or a RAM disk (tmpfs)), or you can use ionice to lower the I/O priority of specific apps, so that you DE has higher priority. I guess git and also github desktop has the same priority (20) as your desktop and therefore prioritize it equally what let it freeze.

It is totally normal to have none scheduler on nvmes, since it throttles I/O, while on other drives MMC/HDD/SSD that could be an speed improvement.

varikonniemi · 27 March 2023 11:39

yeah i did not look up the specific nvme, but after looking it should be perfectly modern enough to not need help. Unless there is some firmware bug that is being triggered.

Then an additional issue for locking up is cpu scheduling, if some program is really thrashing the cpu stealing all time. The only fix for this was previously bfs/muqss but both of them are unmaintained so i have no idea what to do there. Maybe just increase nice value for the git process?

Tygalive · 27 March 2023 13:24

Thank you, will apply these fixes and update.

andreas85 · 27 March 2023 14:10

Do not ever do this with btrfs ! 95.2%

You will get out of btrfs-space soon ! Then btrfs will block.

Deleting files can help, but it may already be too late!

This one time btrfs escaped after an hour of work. Next time it will block !
Then you will need to fix your Btrfs.

Here is the link for repair !

Zesko · 27 March 2023 16:09

And disable btrfs quota

$ sudo btrfs quota disable /

Olli · 27 March 2023 16:17

And buy more disk-space

andreas85 · 27 March 2023 16:33

And cleanup your snapshots