Yeah really funny… 2013 … we have now 2022, so 9 years later. Is there a solution for all those workloads? It is the same, still. However… over years it didn’t bother me and the default was NEVER a problem, since after time you get used to this behavior that (and it is logical, that thumb drives are just slower.).
Funny enough… “Dave Chinner” … without knowing what he has written years ago, it seems we are on a similar wavelength.
In more detail, if we simply implement “we have 8 MB of dirty pages
on a single file, write it” we can maximise write throughput by
allocating sequentially on disk for each subsquent write. The
problem with this comes when you are writing multiple files at a
time, and that leads to this pattern on disk:
And the result is a) fragmented files b) a large number of seeks
during sequential read operations and c) filesystems that age and
degrade rapidly under workloads that concurrently write files with
different life times (i.e. due to free space fragmention).
personally not the first time i’ve mind meddled with the same topic, and to this day it still remains a mind meddle. each time after reading and arriving at the usual tweaks proposed, I also get to read about the other counter-productive issues and pull what little hair i’ve got and get confused and then leave the stock settings and move on…
however kudos to all of you this has been somewhat enlightening, although its been confusing thanks for keeping it civil.
i think what confuses us is that most of us has no triage to the different issues at play here. most of us visit the topic as a one-stop write-cache tweak, when there are lot more at play here, and the way i see it there is still no one solution that fits all.
issues need tackling being;
data transfer reliability
device longevity (flash/SSD)
accurate file manager stats
i think in the case of custom handling of any of these is an absolute must, they should evaluate and use a solution per-case basis. everyone else should just let it be as is.
in my case, i’m using an old-laptop with rusty USB ports, and leaving flash drives, external HDDs plugged-in after bulk file transfers is frequent. what i’m most bothered about is when i accidentally knock on these plugged in supposedly in post-transaction state. some leads to corrupt data, worse with flash drives some lead to corrupt devices. so my issue is mostly weighing on “reliability”, and willing to sacrifice flash drive longevity, and HDD file fragmentation. hence i’m willing to go the “sync” route.
udev has pretty convincing device scanning going on behind the scenes. you can make out your usb HDD from your flash drives if need be. just use udevadmmonitor to find out all the key-values;
If I would be in your situation, I would create script and a service und the sync command periodically or just run it when you do backups… can be easily integrated into a desktop file (Launcher) on the desktop.
while true; do sync && notify-send "Write cache has been synced" && sleep 300; done
while true; do sync | zenity --progress --pulsate --auto-close --no-cancel --title="Write Cache" --text="Synchronize write cache" && zenity --notification --text="Write cache has been synchronized." && sleep 300; done
So every 300sec → 5min
In a desktop file under Exec=
/usr/bin/bash -c "while true; do sync | zenity --progress --pulsate --auto-close --no-cancel --title="Write Cache" --text="Synchronize write cache" && zenity --notification --text="Write cache has been synchronized." && sleep 300; done"
Man it is even a GUI…
It doesn’t bother to run it more frequently. It does it in the background any way, but this way you would have a notification: “Now the write cache is synchronized.”.
Well, if you start concurrent copy jobs to same block, then you will get some fragmentation, depending on the size of the files being copied. However, if you do bulk transfer of multiple files, they’ll be copied one at a time and this doesn’t impact fragmentation. Another thing to consider, is that the fact machines have a lot more RAM these days allows them to have much larger read caches (for example, I currently have a 10GB read buffer/cache), and so fragmentation isn’t that much of a problem as it was in the past (not to mention the existence of SSDs and NVMEs, which are virtually unaffected by fragmentation)
I think we’ve reached a point where this conversation starts to be unproductive. There’s not much more to debate. Personally, I’d like to see Manjaro change these defaults, but I don’t mind if they don’t because I’m able to do it myself. At least these discussions can be useful to other users. Have fun
I was unable to copy a 65GB file to a 128GB USB3 pendrive with the default dirtybytes settings… The transfer speeds was dropped below 2MB/s after 5 minutes… Then I discovered the maxperfwiz script.
By default, these settings are active: vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 1500 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 1500 vm.dirtytime_expire_seconds = 43200
With 32GB ram, maxperfwiz suggested and set these settings: vm.vfs_cache_pressure=75 vm.dirty_ratio=3 vm.dirty_background_ratio=3 vm.dirty_expire_centisecs=3000 vm.dirty_writeback_centisecs=1500 vm.min_free_kbytes=118812
And this solved the problem! Now I was able to write the pendrive with the same speeds as it was written under Windows.
This tunable is used to define when dirty data is old enough to be eligible for writeout by the kernel flusher threads. It is expressed in 100’ths of a second. Data which has been dirty in-memory for longer than this interval will be written out next time a flusher thread wakes up.
The kernel flusher threads will periodically wake up and write `old’ data out to disk. This tunable expresses the interval between those wakeups, in 100’ths of a second.
Setting this to zero disables periodic writeback altogether.
This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size.
Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads.
Setting this too high will OOM your machine instantly.
Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.
The total available memory is not equal to total system memory.
AFAIK, the default settings are from old systems with little memory. In these days, especially the dirty ratio is problematic with the usual 8-16-32GB memory… But this is a complex situation, where you have to deal with changing system memory amounts…
This is one of those things that should be done by the installer I believe. Given that the nature of this settings is quite complicated for a novice user, and that there’s no “one-size-fit-all” values these days due to wide variety of RAM capacities on numerous PCs still running, I think installers like Calamares, Anaconda and so on should deal with this while deploying a new OS installation. Similar to how Ubiquity sets a swap file for all recent Ubuntu installations (btw, could be a good thing for Manjaro too).