I hope this message finds you well. I have a backup routine in place, but I’d like to hear your thoughts and suggestions.
My current backup process involves external hard drives stored in different locations. These drives are encrypted using LUKS and formatted with ext4. Occasionally, I perform squashfs backups, but currently mostly use restic/borg. While this approach seems effective, I have a few concerns:
Backup Verification: I’ve realized that I lack a systematic way to regularly check if my backups are recoverable and intact. How can I verify this? Is there a tool that can help me ensure the backup files are reliable over time? Somehow, I also wasn’t able to find tools for monitoring data health on the to-be-backed-up machine, say, if there was bitrot on the machine’s drive. There must be tools for this purpose or is everyone just hoping there files sitting on their machines are just always fine? If I was able to monitor and detect file corruption one could recover the file from the backup…
External Drive Integrity: How can I verify the integrity of backups stored on external drives? Ideally, I’d like to compare a portion of the backup data against the current files on my machine to detect any discrepancies. Should I be running fsck and/or smartctl checks (not sure what exactly they achieve and how they differ). restic-check would help to detect bitrot/corruption of the backup on the external drive, but then what about bitrot/corruption of files on the machine (which, if unnoticed, would just propagate to all backup media, too).
Choosing the Right File System: I’m unsure whether I should use btrfs or zfs for the external backup drives. While ext4 is battle-tested and reliable, I wonder if there are better options.
User-Friendly Backup Software: I’d like to recommend user-friendly backup software to family members who aren’t comfortable with the command line. Any suggestions that achieve something somewhat comparable?
It’s someone easy for me to spot and select what are user data folders, I am less sure about the config files. I included some .dotdirs but then some of them change a lot due to some often-changing cache/db files that are not actually that relevant to backup. I feel like I am manually iterating over the backed up files, but then if I am too rough I may miss files I would have eventually liked to backup, and if I don’t exclude anything, I am just storing lot’s of nonsense cache files. Unfortunately, how /home/ is used for cache-like files by different software is not super standardised, so any recommendations on how to strike a good balance between “just backup entire home” vs “just backup non-dotdirs” would be appreciated. One recent idea is to back up all data-dirs separately from all dot-dirs, and then for the dotdirs I could have a more tight pruning schedule, as it won’t make sense to save 3 year old config/cache files.
Improvements: If you have any other ideas or best practices related to backup procedures, I’d love to hear them!
Thanks a lot for the pointers! I’ve looked at some, but there is some more material for me to read.
Bitrot - is there no easy software one can run on one’s machine that scans files and stores their md5sums and flags up if the hash changes without the ctime changing? restic/borg can detect bitrot on the backup repository, but if rotten files from the machine slowly percolate through the backup archives over time, that is still a problem.
I only found this GitHub - ambv/bitrot: Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay. which looks somewhat maintained, but not like a very established battle tested software - am I overlooking something or is bitrot on the actual machine, say of 15 year old holiday pictures that then get pushed to the backup, not a real thing my current backup strategy cannot protect against? Essentially, if data integrity is not tested before running a backup, corrupted data could eventually end up in the backup and at some point after pruning overwrite any intact file version that maybe was there…
Different soft- and hardware - while it adds overhead for me, is it a fair point to perhaps have different external media and store backups using different software on them, instead of banking all on a full restic/borg stack?
Your data - that is the content of your home data folders - defined by XDG in the file
~/.config/userdirs.dir
The rest is defined by a few local configs and the content of ~/.config and ~/.local - thus creating a method of restoring you manjaro linux to any given point in time is to have those settings in an available location
The easiest way is to automate your backups. Set up a systemd timer to run them automatically at a preset time (or if the computer isn’t up 24/7, perhaps a certain amount of time after boot). Personally I do a manual full backup (usually before and after a Manjaro update) with incrementals run automatically every night).
You can also use Backintime to take frequent snapshots of user data so you can restore if (as I sometimes do) you muck up a data / text file. Personally I run it every 20 minutes through a timer. Its methodology is such that once you’ve taken your first backup, only changed files are backed up (so it doesn’t take huge amounts of space), but everything else is hard linked so that each snapshot has all the files in it. The GUI to set it up is pretty straightforward.
How do you do automatic timed cold backups? I regularly commute b/w locations anyways and then plug in a harddrive, run the backup, and disconnect, so it serves as cold backup (that would not be affected by ransom or other issues on the live system that would make me need the backup in the first place).
While this works for me, plug-in + fire a command on cmd, this is not what I can expect of my family & friends – while they could do with plug-in + initiate a backup in some nice gui So I take a look at backintime. (With rsync in the background BIT seems to rely on some very established tool, though, I’ve also somehow grown fond of restic/borg, but not sure how they compare in terms of reliability, I guess rsync is hard to beat in that regard.)
Do you know of any established bitrot-detection software? At this point, if in doubt, I am happy to rather back up more than less, but not having a systematic way to check integrity of data entering the backup and corruption on the backup drive over time, is making me a bit uneasy - not sure things would eventually play back if needed. I can mount borg and click around and do some sanity checks, but it would be nice to more systematically check from time to time to rest assured “the backup is still intact, no corruption or bitrot, and I could recover from the backupdrive if needed”. Looking into btrfs/zfs for that reason, but that seems to come with other downsides.
It should be relatively easy to write a UDEV rule that detects a particular external drive being plugged in then automatically runs the backup. I’d make it run in a terminal so they have feedback that it’s running and it’s obvious when it’s finished.
I don’t know of anything in particular, though depending on your backup format it should be easy enough to script something up to periodically make checks. Though your biggest protection against backups going bad (or disappearing) is to make multiple copies on separate physical devices. My own backups go on one separate disc within my computer enclosure, one removable disc that’s always attached by USB, another removable disc that I keep outside the house (in case of the absolute worst!) and another copy on cloud storage (where relevant, encrypted by me before it’s transferred out of my LAN).
My root filesystem backup is always tested anyway, because I keep a separate “rescue” bootable partition on another physical SSD and I create that from root’s backup.
Im just doing exfat backup (not working for simlinks) on my backup drive’s, encrypted with veracrypt.
Im not a fan from third party backup tools for cold backup and its possible im doing something wrong here but i do my backups just manually.
After i mirrored my files from my source drive to another, i simply just do properties after im done and compare the bytes, of course i have more than just one mirrored drive, before i refresh and mirror my next drive.
Normally i would see a read error while copy my files or see less bytes on the mirrored drive. I never saw silence rotten files yet… is this maybe a SSD problem to miss that? Because im only using HDD for backups.
For root and home i just use Timeshift rsync on a external ext4 drive.
I have multiple backups. Maybe I misunderstand, but it does not help if there is a file on my machine that is just sitting there and at some point experiencing corruption or rotting on my machine. If it is not a file I am looking at regularly I may not notice but run new backups and then some time later realise that the corrupted version has also been propagated to all backup media. Depending on the retention policy and schedule of older backups there may be a point very the old non-corrupted version that used to be in the backups is also no longer there. So the problem I am seeing is pushing, without knowing, corrupted files from the computer to the backups through the regular backup routines. I guess really the question is how to detect rotting on the system that is being backed up.
Oh yeah, agree, there is something nice about simply using rsync to have a backup that is accessible without separate backup tool.
What made you use exfat? I am wondering if for backup drives there are other tradeoffs or factors to consider than for the main system. For me currently both are ext4 for no specific reason but ext4 seemingly being the de facto standard.
This sounds cool. How does this work? Before running timeshift you mount the external drive and somehow it is rsyncing home (I assume you mean the config under home and not the data files which you mentioned for the exfact backup?) and root to that external drive. I saw timeshift recommended for system backup say before running an manjaro upgrade, but always thought it had to be on the same disk as the manjaro install in order for rsync linking to work. But then again if one bricks the system with a system upgrade it seems tricky to recover if the timeshift is not on a separate disk. Curious to look more into this for the system roll-back kind of backup! But I believe this may be a separate topic from the cold long-term data files backups.
Bear in mind that scripts launched via UDEV get killed if they take too long (not sure how long too long is). I’ve got round that by firing off a systemd service from the UDEV rule.
Anyone knows of a software that creates hashes for all files on the local, and allows incrementally checking those hashes every now and then, could be used as an early detection of bitrot on the machine that may otherwise go wityout noticing
I am not sure doing this incrementally is that easy, say, i want to spend 5 min after each backup to pickup the last check from where it left and continue to scan further files. I am just wondering if really noone is doing that type of thing for their long-0standing music/photo/document collection that is sitting on their machines for years? If it requires a script it’s not something average users actually can ensure over time. Or is this fully taken care of by smartctl disk checks for fsck? Not sure. Seems unless one is reading the bytes and checksmming every now and then one would not know if their files get corrupted. Sure, we have backups, but if we don’t notice corruption and keep backing up our now corrupted data, at some point the corrupted version will be the only one on the machine and the backups and if we 10 years down the road wish to see our photos from our last 10-year anniversary, it’s only then we realise our data suffered corruption. I feel good about my backups, but somehow this seems like aloophole in the common strageies
I’m not really sure - you should research/read about it yourself - but depending on the file system, this feature is pretty much integrated already
in BTRFS
and in ext4 via the feature -O metadata_csum which can be enabled if it isn’t already present by default