I am having trouble getting my maintenance right for my server’s SSD storage.
Running a server with ordinary spinning disks poses no real issue but when you run a server with SSD disk I am having frequent issues with the file systems hanging and I suspect it is due to the rapid change of content on the filesystem.
The server is running primarily as mirror for Manjaro - secondly a private Arch mirror.
I have been experimenting with 2x500GB480G Samsung EVO840 Kingston SSD in RAID10,far2 formatted with ext4 - and it works fast as is expected.
I have enabled fstrim with the defaults which is weekly and this is not enough - filesystem still gets clogged and when it gets clogged the raid fail’s on of the disks and everything slows down to a crawl.
I am now trying to the Manjaro part on a single 500G SSD - the issue still pops up.
I have been trying using discard in the mount options - and the issue persist.
I then changed the fstrim.timer to run daily - but still problems.
I know several hosting providers does brag of their SSD based hosting so it should be possible - so I am searching for the tricks to maintain a rapidly changing filestorage (using rsync) without having the mirror breakdown every two weeks.
That’s a lot of information, without a lot of detail
cat /etc/mdadm.conf
cat cat /proc/mdstat
inxi --admin --verbosity=7 --filter --no-host --width #drive section only is fine
smartctl --all /dev/XdY #where X and Y denominate your drive letters
The system is - as noted - not using mdraid - but using a single disk approach. I can supply the specs but it would accomplish very little as the issue is - as far as I can deduct - related to the fact that running rsync on SSD’s generates a lot - a huge amount - of changes every 5 minutes when resyncing Manjaro’s 4 branches.
These changes appears to generate a lot of discards which - within a short timespan - clogs the system down to a crawl - even when I - to troubleshoot this issue - has limited my setup to bare necessity
My notes on creating the RAID (created while reading Arch Wiki) can be read following this link to my notepad MD/RAID notes | NIX NOTES - I used the two-disk-setup notes
I don’t use a RAID controller but a PCIe SATA controller with 4 mSATA devices - all identical (only deviation I made from my usual approach is that I bought 6 identical devices at the same time - so probably from the same batch too.)
I have tried the discard option - from what I know you never use the discard and fstrim at the same time.
The discard option does the discard on the fly - at the cost of rapidly aging the memory cells.
From what I read the fstrim command does the same thing as discard - but only on demand - thus not on a regular base - supposedly helping to prolong the lifespan of the memory cells.
My goal right now is to find the best maintenance options for the devices - next is to make those options propagate through a mdraid array down to the physical devices - hopefully eliminating the - quite annoying - slowdowns in the mirror service.
My mirror syncs directly with the main repo server and does so using a systemd timer and the accompanying service - and the service is launching a script tasked with setting a lock file to provide a clean exit on long sync ops and providing the necessary sync options.
As you didn’t DM me: I used to build storage systems in my far away past, but I’m a bit rusty, so bear with me: I like a bottom-up approach. (Statue built on clay feet and all that).
So, I’m really sorry, but I’m still confused. You say that you don’t use mdadm but your notes say you do and your inxi says you don’t.
So let me take 2 steps back and try to understand your HW/SW setup / problem:
Your config:
1 PCIe SATA controller, solely reserved for your RAID
4 (four) dev/sd[h-k] Kingston model: SUV500MS480G
All 4 in a RAID-10,far2
systemd mount EXT4, noatime
systemd trim timed to 1/day
2*(rsync /remote / local)
1* Arch
1* Manjaro
Most of your I/O is taken up by writing, in contrast to most other systems that do mostly reading.
This is a headless server, no heavy front-end applications, no SQL DBs, …
32 GB RAM
Problem:
Slowdown after 24h on RAID
Your guess is that the root cause is the trim.
Your question:
How do I prevent slowdown?
Please correct my (mis)understanding of your system and please tell me:
How did you create the RAID10,far without mdadm and without a HW RAID controller?
Is the rsync running In an infitine loop? (If not: how?)
What are the min/max/avg changes daily?
What is the battery back-up on this thing? (No UPS?)
If none: if the power goes out, does it matter that not everything was written to disk? (Assuming no, but asking anyway)
What have you measured already that is irrelevant to this discussion?
(SATA CTRLR usage, Individual Disk I/O (i.e. One bad apple rotting the entire RAID ), Disk bandwidth, PCI/SATA bandwidth, …)
Just trying to avoid you doing something you’ve already done and disregarded.
Did you change any Kernel parameters from the default? (Assuming no because of swappiness parameters, but asking anyway)
Thank you for taking the time to use brain energy on my issue.
Yep - I did start with mdadm - also noted in OP - and I reverted to single disk to narrow down what why I have issues while using SSD because this is the major difference.
I have been running a mirror for a couple of years - but that mirror runs off a server using spinning disks more precisely 24/7 Seagate disks designed for surveillance equipment.
My topic is entirely on the topic - how to handle the SSD maintenance - as this seems to me to be the root cause of my issues.