I dont know if i even need these automated but im really confused trying to use btrfsmaintenance.
firstly it has 2 packages called btrfsmaintenance
1 aur/btrfsmaintenance 0.5-2
2 extra/btrfsmaintenance 1:0.5.1-1
It says i have both installed?
secondly, the btrfs assistant only shows 3 of the 4 (scrub, trim, balance, defrag) it does not show trim.
so i just try to use the shell. i edit /etc/default/btrfsmaintenance and change the BTRFS_TRIM_PERIOD="weekly" and run sudo /usr/share/btrfsmaintenance/btrfsmaintenance-refresh-cron.sh but the sudo systemctl status btrfs-trim.timer still shows as monthly timer.
I think it was supposed to make something in /etc/systemd/system but there is only btrfs-scrub.timer.d and btrfs-balance.timer.d
Then i notice there are errors in journalctl when this tries to execute any of the commands /usr/share/btrfsmaintenance/btrfs-balance.sh: line 47: run_task: command not found
So after loads of digging i find 2 errors in the script /usr/share/btrfsmaintenance/btrfsmaintenance-functions
Im not sure what i am doing here, im loosing the will, do i need this or am i doing it all wrong?
Normally, this will be taken care of by systemdās fstrim.timer, which will trim all read/write-mounted filesystems once a week ā Sunday at midnight.
Regular scrubs donāt hurt. They have been known to catch faulty hardware. Even my Synology box (by default), does a btrfs scrub every 6 months. I increase it.
A scrub is really useful if you are using RAID1 . Then the scrub not only detects the error, but can also fix it. The damaged copy is deleted and the
undamaged copy is used to rewrite the area elsewhere.
Attention:
A scrub does not test all sectors of a drive, but only reads (and tests) the sectors that btrfs is currently using.
The following applies to all RAID levels:
Hardware that reports errors during a scrub should definitely be replaced promptly.
It was my understanding, that you need that for correcting errors. But within the btrfs metadata itself, contains checksums of chunks of data. So you can still detect errors.
Correcting them will need a redundant multi-device setup. Or even snapshots on the same device can also be of use. (Something such as bad CPU cache causing corrupt data.) I know @Zesko has the Snapper commands to repair this type of situation, even from a previous snapshot. (I canāt seem to find the post by searching.)
But even on my single device btrfs filesystem:
[tdell@mbox pacman.d]$ sudo btrfs fi show /
Label: none uuid: 310e0472-f96e-485c-8fa1-585bcccfb624
Total devices 1 FS bytes used 544.54GiB
devid 1 size 896.78GiB used 737.09GiB path /dev/nvme0n1p2
[tdell@mbox pacman.d]$ sudo btrfs inspect-internal dump-super /dev/nvme0n1p2 | grep csum
csum_type 0 (crc32c)
csum_size 4
csum 0xffc3fac5 [match]
A btrfs snapshot points to the same data. So if the original data is corrupted and scrub discovers this, the data in the snapshot is also lost because the pointers point to the same data.
Only with the metadata does btrfs protect itself by default by keeping 2 copies. This can prevent damage to the file system.
Only a RAID1 setup with different devices can benefit from a scrub. But even then, the following applies: Replace devices that produce errors quickly! Otherwise, it is like playing with fire.
The only reason for a scrub that I can think of is if the computer was turned off by a power failure or something similar while data was being written.
Youāre right of course, with the snapshot part of it, I was thinking of another situation.
But I was still emphasising the scrub for checking. Dmesg will output these as warnings on the fly, but I thought scrubing does a checksum of the b-tree data. (Which is not a thorough as a file system check.)
From the man page:
Scrub is a pass over all filesystem data and metadata and verifying the checksums. If a valid copy is availā
able (replicated block group profiles) then the damaged one is repaired. All copies of the replicated profiles
are validated.
So this does do a checksum validity check on a single device scrub, right? (Which is not the same thing as an fsck.) And the repair is with redundant data.
But you said scrubbing does nothing here. But I thought one of the features about btrfs was all about checksum validation. Which hinders performance, for data integrity.
I said two things. I tried to redact the repair and everything about RAID multiple times.
A single device does this (even though this happens with RAID as well. The checksum would determine which of the data is the non-corrupt data, you would think).
Itās been said twice, that scrubing without RAID, does nothing. Thatās what Iām trying to say isnāt the case. If it is not, Iād like to know, as Iām always learning. The man page canāt surely be wrong??
Iām trying to say it does a crc32 check (by default) on data thatās been written in any scrub. Even on a single device.