Maintaining server disk infrastructure SSD

linux-aarhus · 19 February 2021 10:27

I am having trouble getting my maintenance right for my server’s SSD storage.

Running a server with ordinary spinning disks poses no real issue but when you run a server with SSD disk I am having frequent issues with the file systems hanging and I suspect it is due to the rapid change of content on the filesystem.

The server is running primarily as mirror for Manjaro - secondly a private Arch mirror.

I have been experimenting with 2x~~500GB~~480G ~~Samsung EVO840~~ Kingston SSD in RAID10,far2 formatted with ext4 - and it works fast as is expected.

I have enabled fstrim with the defaults which is weekly and this is not enough - filesystem still gets clogged and when it gets clogged the raid fail’s on of the disks and everything slows down to a crawl.

I am now trying to the Manjaro part on a single 500G SSD - the issue still pops up.

I have been trying using discard in the mount options - and the issue persist.

I then changed the fstrim.timer to run daily - but still problems.

I know several hosting providers does brag of their SSD based hosting so it should be possible - so I am searching for the tricks to maintain a rapidly changing filestorage (using rsync) without having the mirror breakdown every two weeks.

Fabby · 20 February 2021 15:06

That’s a lot of information, without a lot of detail

cat /etc/mdadm.conf
cat cat /proc/mdstat
inxi --admin --verbosity=7 --filter --no-host --width #drive section only is fine
smartctl --all /dev/XdY #where X and Y denominate your drive letters

And the exact fstrim would be helpful too…

linux-aarhus · 20 February 2021 19:00

The system is - as noted - not using mdraid - but using a single disk approach. I can supply the specs but it would accomplish very little as the issue is - as far as I can deduct - related to the fact that running rsync on SSD’s generates a lot - a huge amount - of changes every 5 minutes when resyncing Manjaro’s 4 branches.

These changes appears to generate a lot of discards which - within a short timespan - clogs the system down to a crawl - even when I - to troubleshoot this issue - has limited my setup to bare necessity

mount unit for Manjaro mirror

~ >>> cat /etc/systemd/system/data-repos-manjaro.mount                                                                   
[Unit]
Description=Mount Manjaro Repo

[Mount]
What=/dev/disk/by-uuid/f4eadaab-80bf-4d4b-8084-90c6c6d04a0f
Where=/data/repos/manjaro
Type=ext4
Options=rw,noatime

[Install]
WantedBy=multi-user.target

Summary

For more details see fstrim(8).
~ >>> cat /etc/fstab                                                           
UUID=C139-819A                            /boot/efi      vfat    umask=0077 0 2
UUID=41e1e1ca-85c2-4461-bf6c-b31ab86350cf /              ext4    defaults,noatime 0 1
UUID=4eab5278-4056-424c-bbd6-1cb9a1abb1c1 swap           swap    defaults,noatime 0 0
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0

#### service bind mounts
/data/repos/manjaro             /srv/http/uex/www/repos/manjaro       none	bind 0 0

inxi sysinfo

~ >>> inxi --admin --verbosity=7 --filter --no-host --width                    
System:
  Kernel: 5.4.99-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.1 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.4-x86_64 
  root=UUID=41e1e1ca-85c2-4461-bf6c-b31ab86350cf rw zswap.enabled=1 
  Console: tty 0 Distro: Manjaro Linux 
Machine:
  Type: Desktop System: LENOVO product: 30A3001FGE v: ThinkStation E32 
  serial: <filter> 
  Mobo: LENOVO model: SHARKBAY v: 0B98401 PRO serial: <filter> UEFI: LENOVO 
  v: FBKTDBAUS date: 12/24/2019 
Memory:
  RAM: total: 31.04 GiB used: 689.4 MiB (2.2%) 
  RAM Report: missing: Required program dmidecode not available 
CPU:
  Info: Quad Core model: Intel Core i5-4570 bits: 64 type: MCP arch: Haswell 
  family: 6 model-id: 3C (60) stepping: 3 microcode: 28 L2 cache: 6 MiB 
  bogomips: 25551 
  Speed: 799 MHz min/max: 800/3600 MHz Core speeds (MHz): 1: 799 2: 799 3: 798 
  4: 798 
  Flags: abm acpi aes aperfmperf apic arat arch_perfmon avx avx2 bmi1 bmi2 bts 
  clflush cmov constant_tsc cpuid cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm 
  dts epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu fsgsbase 
  fxsr ht ibpb ibrs ida invpcid invpcid_single lahf_lm lm mca mce md_clear mmx 
  monitor movbe msr mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm 
  pdpe1gb pebs pge pln pni popcnt pse pse36 pti pts rdrand rdtscp rep_good 
  sdbg sep smep smx ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 
  tpr_shadow tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic xsave 
  xsaveopt xtopology xtpr 
  Vulnerabilities: Type: itlb_multihit status: KVM: Split huge pages 
  Type: l1tf 
  mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled 
  Type: mds mitigation: Clear CPU buffers; SMT disabled 
  Type: meltdown mitigation: PTI 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, 
  IBRS_FW, STIBP: disabled, RSB filling 
  Type: srbds mitigation: Microcode 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: Intel Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics 
  vendor: Lenovo driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:0412 
  class ID: 0300 
  Display: server: N/A driver: loaded: intel unloaded: fbdev,modesetting,vesa 
  tty: 80x24 
  Message: Unable to show advanced data. Required tool glxinfo missing. 
Audio:
  Device-1: Intel Xeon E3-1200 v3/4th Gen Core Processor HD Audio 
  vendor: Lenovo driver: snd_hda_intel v: kernel bus ID: 00:03.0 
  chip ID: 8086:0c0c class ID: 0403 
  Device-2: Intel 8 Series/C220 Series High Definition Audio vendor: Lenovo 
  driver: snd_hda_intel v: kernel bus ID: 00:1b.0 chip ID: 8086:8c20 
  class ID: 0403 
  Sound Server: ALSA v: k5.4.99-1-MANJARO 
Network:
  Device-1: Intel Ethernet I217-LM vendor: Lenovo driver: e1000e v: 3.2.6-k 
  port: f080 bus ID: 00:19.0 chip ID: 8086:153a class ID: 0200 
  IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter> 
  IP v4: <filter> scope: global broadcast: <filter> 
  IP v6: <filter> scope: link 
  WAN IP: <filter> 
Bluetooth:
  Message: No Bluetooth data was found. 
RAID:
  Message: No RAID data was found. 
Drives:
  Local Storage: total: 2.18 TiB used: 190.54 GiB (8.5%) 
  SMART Message: Required tool smartctl not installed. Check --recommends 
  ID-1: /dev/sda maj-min: 8:0 vendor: Kingston model: SV300S37A240G 
  size: 223.57 GiB block size: physical: 512 B logical: 512 B speed: 6.0 Gb/s 
  rotation: SSD serial: <filter> rev: UA scheme: GPT 
  ID-2: /dev/sdb maj-min: 8:16 vendor: OCZ model: AGILITY3 size: 223.57 GiB 
  block size: physical: 512 B logical: 512 B speed: 6.0 Gb/s rotation: SSD 
  serial: <filter> rev: 2.25 scheme: GPT 
  ID-3: /dev/sdh maj-min: 8:112 vendor: Kingston model: SUV500MS480G 
  size: 447.13 GiB block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s 
  rotation: SSD serial: <filter> rev: 56RR scheme: GPT 
  ID-4: /dev/sdi maj-min: 8:128 vendor: Kingston model: SUV500MS480G 
  size: 447.13 GiB block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s 
  rotation: SSD serial: <filter> rev: 56RR scheme: GPT 
  ID-5: /dev/sdj maj-min: 8:144 vendor: Kingston model: SUV500MS480G 
  size: 447.13 GiB block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s 
  rotation: SSD serial: <filter> rev: 56RR scheme: GPT 
  ID-6: /dev/sdk maj-min: 8:160 vendor: Kingston model: SUV500MS480G 
  size: 447.13 GiB block size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s 
  rotation: SSD serial: <filter> rev: 56RR scheme: GPT 
  Message: No Optical or Floppy data was found. 
Partition:
  ID-1: / raw size: 214.47 GiB size: 210.11 GiB (97.96%) 
  used: 87.94 GiB (41.9%) fs: ext4 block size: 4096 B dev: /dev/sda2 
  maj-min: 8:2 label: N/A uuid: 41e1e1ca-85c2-4461-bf6c-b31ab86350cf 
  ID-2: /boot/efi raw size: 300 MiB size: 299.4 MiB (99.80%) 
  used: 312 KiB (0.1%) fs: vfat block size: 512 B dev: /dev/sda1 maj-min: 8:1 
  label: N/A uuid: C139-819A 
  ID-3: /data/repos/manjaro raw size: 400 GiB size: 392.72 GiB (98.18%) 
  used: 100.76 GiB (25.7%) fs: ext4 block size: 4096 B dev: /dev/sdh1 
  maj-min: 8:113 label: manjaro_repo 
  uuid: f4eadaab-80bf-4d4b-8084-90c6c6d04a0f 
  ID-4: /srv/http raw size: 200 GiB size: 195.86 GiB (97.93%) 
  used: 1.84 GiB (0.9%) fs: ext4 block size: 4096 B dev: /dev/sdb1 
  maj-min: 8:17 label: N/A uuid: 4308413d-6f11-42f0-88f7-ea2a4fa69149 
  ID-5: /srv/http/uex/www/repos/manjaro raw size: 400 GiB 
  size: <superuser required> used: <superuser required> fs: ext4 
  dev: /dev/sdh1 maj-min: 8:113 label: manjaro_repo 
  uuid: f4eadaab-80bf-4d4b-8084-90c6c6d04a0f 
Swap:
  Kernel: swappiness: 60 (default) cache pressure: 100 (default) 
  ID-1: swap-1 type: partition size: 8.8 GiB used: 0 KiB (0.0%) priority: -2 
  dev: /dev/sda3 maj-min: 8:3 label: N/A 
  uuid: 4eab5278-4056-424c-bbd6-1cb9a1abb1c1 
Unmounted:
  ID-1: /dev/sdi1 maj-min: 8:129 size: 400 GiB fs: ext4 label: archlinux_repo 
  uuid: 0d30ff3d-2e01-438c-b4cf-d1958bfd7503 
  ID-2: /dev/sdj1 maj-min: 8:145 size: 400 GiB fs: ext4 label: N/A 
  uuid: b4459a12-1fa6-42bf-ae11-e6414e66bc6c 
  ID-3: /dev/sdk1 maj-min: 8:161 size: 400 GiB fs: ext4 label: N/A 
  uuid: fd3e5579-068a-4c41-9b4e-5942de45e349 
USB:
  Hub-1: 1-0:1 info: Full speed (or root) Hub ports: 3 rev: 2.0 
  speed: 480 Mb/s chip ID: 1d6b:0002 class ID: 0900 
  Hub-2: 1-1:2 info: Intel Integrated Rate Matching Hub ports: 6 rev: 2.0 
  speed: 480 Mb/s chip ID: 8087:8008 class ID: 0900 
  Hub-3: 2-0:1 info: Full speed (or root) Hub ports: 15 rev: 2.0 
  speed: 480 Mb/s chip ID: 1d6b:0002 class ID: 0900 
  Device-1: 2-1:2 info: Logitech Unifying Receiver type: Keyboard,Mouse,HID 
  driver: logitech-djreceiver,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s 
  chip ID: 046d:c52b class ID: 0300 
  Device-2: 2-5:3 info: Realtek RTS5182 Card Reader type: Mass Storage 
  driver: ums-realtek interfaces: 1 rev: 2.0 speed: 480 Mb/s 
  chip ID: 0bda:0184 class ID: 0806 serial: <filter> 
  Hub-4: 3-0:1 info: Full speed (or root) Hub ports: 3 rev: 2.0 
  speed: 480 Mb/s chip ID: 1d6b:0002 class ID: 0900 
  Hub-5: 3-1:2 info: Intel Integrated Rate Matching Hub ports: 8 rev: 2.0 
  speed: 480 Mb/s chip ID: 8087:8000 class ID: 0900 
  Hub-6: 4-0:1 info: Full speed (or root) Hub ports: 6 rev: 3.0 speed: 5 Gb/s 
  chip ID: 1d6b:0003 class ID: 0900 
Sensors:
  System Temperatures: cpu: 29.8 C mobo: 27.8 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 142 Uptime: 5m wakeups: 0 Init: systemd v: 247 Compilers: 
  gcc: 10.2.0 Packages: pacman: 562 lib: 106 Shell: Zsh v: 5.8 
  running in: tty 0 (SSH) inxi: 3.3.01

Fabby · 21 February 2021 13:27

How did you set up the RAID10,far2 then? In the HW RAID Controller???
I still don’t understand from where to where you’re syncing and how frequent those syncs are.

But with the information I do have:

Options=rw,noatime

I would:

change that to include discard
Change the timer to:

OnCalendar=hourly
AccuracySec=1m

to start of with.

P.S. DM me as there is some info I don’t want to post publicly.

linux-aarhus · 21 February 2021 14:36

My notes on creating the RAID (created while reading Arch Wiki) can be read following this link to my notepad MD/RAID notes | NIX NOTES - I used the two-disk-setup notes

I don’t use a RAID controller but a PCIe SATA controller with 4 mSATA devices - all identical (only deviation I made from my usual approach is that I bought 6 identical devices at the same time - so probably from the same batch too.)

I have tried the discard option - from what I know you never use the discard and fstrim at the same time.

The discard option does the discard on the fly - at the cost of rapidly aging the memory cells.

From what I read the fstrim command does the same thing as discard - but only on demand - thus not on a regular base - supposedly helping to prolong the lifespan of the memory cells.

My goal right now is to find the best maintenance options for the devices - next is to make those options propagate through a mdraid array down to the physical devices - hopefully eliminating the - quite annoying - slowdowns in the mirror service.

My mirror syncs directly with the main repo server and does so using a systemd timer and the accompanying service - and the service is launching a script tasked with setting a lock file to provide a clean exit on long sync ops and providing the necessary sync options.

Fabby · 21 February 2021 15:59

As you didn’t DM me: I used to build storage systems in my far away past, but I’m a bit rusty, so bear with me: I like a bottom-up approach. (Statue built on clay feet and all that).

So, I’m really sorry, but I’m still confused. You say that you don’t use mdadm but your notes say you do and your inxi says you don’t.

So let me take 2 steps back and try to understand your HW/SW setup / problem:

Your config:

1 PCIe SATA controller, solely reserved for your RAID
4 (four) dev/sd[h-k] Kingston model: SUV500MS480G
All 4 in a RAID-10,far2
systemd mount EXT4, noatime
systemd trim timed to 1/day
2*(rsync /remote / local)
- 1* Arch
- 1* Manjaro
Most of your I/O is taken up by writing, in contrast to most other systems that do mostly reading.
This is a headless server, no heavy front-end applications, no SQL DBs, …
32 GB RAM

Problem:

Slowdown after 24h on RAID
Your guess is that the root cause is the trim.

Your question:

How do I prevent slowdown?

Please correct my (mis)understanding of your system and please tell me:

How did you create the RAID10,far without mdadm and without a HW RAID controller?
Is the rsync running In an infitine loop? (If not: how?)
What are the min/max/avg changes daily?
What is the battery back-up on this thing? (No UPS?)
- If none: if the power goes out, does it matter that not everything was written to disk? (Assuming no, but asking anyway)
What have you measured already that is irrelevant to this discussion?
(SATA CTRLR usage, Individual Disk I/O (i.e. One bad apple rotting the entire RAID ), Disk bandwidth, PCI/SATA bandwidth, …)
- Just trying to avoid you doing something you’ve already done and disregarded.
Did you change any Kernel parameters from the default? (Assuming no because of swappiness parameters, but asking anyway)
What’s the output to:
```
free --human
sudo hdparm -W /dev/sd[h-k]
cat /sys/block/sd[h-k]/queue/scheduler
```
before the issue is happening and sudo sysctl --all | grep dirty both before and after

linux-aarhus · 21 February 2021 17:32

Thank you for taking the time to use brain energy on my issue.

Yep - I did start with mdadm - also noted in OP - and I reverted to single disk to narrow down what why I have issues while using SSD because this is the major difference.

I have been running a mirror for a couple of years - but that mirror runs off a server using spinning disks more precisely 24/7 Seagate disks designed for surveillance equipment.

My topic is entirely on the topic - how to handle the SSD maintenance - as this seems to me to be the root cause of my issues.

Fabby · 21 February 2021 18:24

Aha! Got it now!

Looking forward to some tangible data whenever it happens again.