Some issues after restoring system with Borg Backup
So far, I’ve been using BorgBackup to backup my system from the command line, much like how you would do it using rsync. I use a bash script with the following command (with $REP and $SNP denoting the repository and snapshot filename, respectively) :
Note this backups the system only (including some hidden files in /home). Documents etc are backed up using another script
After the last stable update, my system was experiencing issues so I decided to restore it from the latest Borg backup (from a LiveUSB after having reformatted the system drive). On the next reboot, the system hanged with a message «Failed to start Light display manager». I could still login in TTY and use startx to get me in the graphical interface (admin password required) but several things were not working properly. For example, the langage was wrong and I couldn’t log into my VPN. So basically, I ended up reinstalling Manjaro from scratch, which is something I wanted to avoid with a proper backup routine.
It is the first time in 2 years that I have to effectively restore my system from the backup but back when I made my backup script, I did test it and I was able to restore my system without any issue at that time.
I don’t know what went wrong in my backup/restore process and why LightDM would fail to start after the restore process (it was working fine when I did the backup obviously and I also tested the script in the past). I don’t really need to troubleshoot this anymore since I ended up reinstalling my system anyway but I’d like to correct mistakes if I’m doing anything wrong with my backup for the future.
Namely, I am excluding several files/folders from the backup. I thought these would be safe to exclude but do you know if such exclusions may have caused LightDM to fail ? For example, I did exclude /run as it is generally recommended to do so but lightdm.conf file states that the run directory for lightdm is /run/lightdm. So should that directory still be part of the backup for lightDM to start after restore ? What could be the reason for lightDM failing to start after a restore ? Should I remove some exclusions in the backup command?
Thanks for the tutorial. I had already seen it but at the time I made my backup script I preferred using Borg as well for system backup although this may indeed be problematic. Basically I was fooled by the fact that many people advocate to use Timeshift for system backup. However since Timeshift is essentially based on rsync, I didn’t get why Timeshift would work while Borg wouldn’t. They basically do the same thing. So I went for Borg for its deduplication feature. Now I understand this may not be a “crash proof solution” and that Clonezilla or any other cold backup program are needed to restore a system from scratch.
Now if I understand it correctly, my script is okay and I can still use Borg to backup all system files/folders as well as /home but I simply need a Clonezilla image of a freshly installed Manjaro first. Once this image is made (just once) I can use Borg to monitor changes to the system afterwards, like system restore on windows. In case of an issue, I can format the drive, restore the Clonezilla image and then restore the system to its state using the Borg backup. Is that correct? I don’t even need to update the Clonezilla image (it can be completely outdated) as long as Borg monitors all the changes since that image was done. Right?
I already did prune my Borg backups (well the one that failed…). But just for the sake of understanding, why would pruning be an issue in my setting (i.e. with an outdated Clonezilla image) while not in yours? I mean in my setting Borg also backups system files, just like in yours, so why should I risk losing system files? Do you mean that if I only keep Borg backups performed over a period of 1 year, I should update the CloneZilla image e.g. every 6 months so that the available Borg backups (after pruning) always cover the state in which the system was at the time I did the Clonezilla image ?
Anyways, you answered my initial question since I had a poor backup practice with no cold backup so I can mark this as solved. Thanks
I will stick to the tutorial but just to understand your example a bit better:
Backup of sort 1.0 would certainly be gone but why would I loose backup of sort 2.0? It should be part of the latest Borg backup since I ran Borg after having installed sort 2.0! So if I restore the system just from the Clonezilla image, I would still have sort 1.0 installed but if I then also restore the latest Borg backup, I should get sort 2.0 then. Or am I missing something?
Now about loosing sort 1.0 backup, well that is obvious if you prune the Borg backup tree. Pruning everything older than 2 months means I’m okay to not be able to restore the system in a state older than 2 months. But this is the same no matter the strategy (tutorial or not). So I don’t actually get what is the problem here…
You installed sort 2.0 the day after making your System Backup 3 (three) months ago
You backed up for 3 months
Today you prune everything older than 2 (two) months, so sort 2.0 is gone from your backup…
Remember: Borg is a deduplicating backup system so it holds all files in chunks in its database and only backs up non-existent chunks. There is no concept of a “full” and “incremental” backup so if you want to have a full backup of everything since your installed the system, you should never prune…
So in your case, after you prune you need to make the image again and that’s why, in the tutorial, the System Backup and the Data backup are completely separated so people wouldn’t run into these edge cases as the tutorial describes a crash-proof backup…
Even if you loose the entire computer (I.E. stolen) the methodology from the tutorial will even survive that (unless they stole your backup drive too)
If you prune everything older than 2 (two) months, then the first available Borg snapshot after pruning will contain the state of the system two months ago (because when pruning, all differentials accumulated over the first month of backup will be summed before being deleted)…and at that time sort 2.0 was installed.
My understanding is that pruning will delete differential backups older than 2 months but it will first recreate the “full” backup formed from all these differentials, so that you do not loose anything except the ability to restore the system in a state that is older than 2 months old (but who cares, at that time sort 2.0 was already installed).
Basically, if you have 3 monthly backups:
Month 1: backup 1 (full)
Month 2: backup 2 (diff)
Month 3: backup 3 (diff)
and you decide to prune everything older than month 2, you’ll get
Month 2: backup 1 + backup 2 (full)
Month 3: backup 3 (diff)
so you loose the ability to restore backup 1 but (backup 1+ backup 2) is now you’re full backup. Still not correct?
No, I don’t think this is true. If you run borg create ... it will always do a some kind of full backup. But of course Borg can’t create “incremental” backups. With “kind of full backup”, I mean it will backup all files you selected with with borg create ... command. In borg it is called a “archive”. A “archive” contains all files that have been selected via the borg create ... command. A repository ( in borg therms ) can hold many archives, but every archive needs to have a unique name.
This is the view you get form the outside. A archive contains all files at the moment borg create ... is executed.
However, a identical file is not stored multiple times. Borg will store only new files, and only parts (chunks) of new bigger files. But it creates a note of all files that are part of the new create archive.
If you use borg prune ..., borg will remove the list of files of the selected archives. And in a second step, borg will remove files that are not referenced by any other archive anymore. But it will no remove any file that is referenced by another archive, even if the archive is removed that added a file for the first time.
In other words. Borg saves a list of files for every backup, it is called an archive. It will also save, in the repository, all real file data for all archives. It will only add additional real file data, if it is not already in the repository. If a borg archive is removed (prune), in the first step only the list of files is removed. If real file data is not referenced by any other archive, it is removed too.
I just found out that I could restore my system from my Borg backup (the one that failed previously with the message “Failed to start LightDM”). For some unknown reasons, permissions on some files/folders in /var/lib/lightdm were wrong after restoring from the backup (the group was set to geoclue whereas it should be lightdm). I just did delete /var/lib/lightdm, rebooted the system, and voila! I just don’t get why these permissions were incorrect in the backup or were changed during the restore process. Anyway letting the system rebuild /var/lib/lightdm did the trick and everything works as before.
Do you know if /var/lib should be preferably excluded from the backup?
No you want /var/lib in your backup, for example /var/lib/pacman is really important. But you might exclude some folders in /var/lib, but which folders depend on the applications you use.
For exmaple I don’t want to back up podman images, so I exclude /var/lib/containers/storage. On a different system, I don’t want to backup /var/lib/mysql , because I do a mysqldump in the backup script and I don’t want to stop the DB.
But it all depends on the applications that store data in /var/lib
Makes sense. So I’ll keep /var/lib in the backup and manually drop /var/lib/lightdm for now.
It is actually strange because I recheck the restore process in a VM and right after having restored the files from the Borg archive, the permissions on /var/lib/lightdm are correct. So the Borg backup is not the problem. However as soon as I reboot the system, permissions on /var/lib/lightdm change from “lightdm” to “geoclue” for whatever reason and this makes lightDM fail to start. Then, simply deleting /var/lib/lightdm and letting the system recreate it on the next reboot permanently solves the issue. Weird!
In fact I much prefer this solution compared to using a cold backup such as CloneZilla. I agree that CloneZilla might be more reliable. However with Borg, I can restore the system on another hard drive (even if the destination partition is smaller than the source, the only restriction is that it should have enough space for the Borg archive to get extracted), another computer or even inside a VM (my script reinstall grub and modifies /etc/fstab to match the UUID). This is really convenient! I just need to figure out why those permissions on /var/lib/lightdm get changed at some point although they are correct in the Borg archive.
You can do that also on Clonezilla by editing the XdY-pt.parted file and using the actual space in use + 10% (don’t make it too small). Obviously,X and Y denominate your disk’s actual drive letters E.G. sde for my SD-card reader and sdb for my second disk…
P.S. Seeing @xabbu 's excellent research I won’t be trying to create a new archive, overwrite a file, prune the Borg DB and then trying to restore the overwritten file this week-end as it seems the pruning always keeps a full backup of the latest file.
This isn’t correct. As long as 2.0 is still present on the system it won’t be gone. When you prune a repository you always keep consistency of the current files (the most recently backed up). That is, only the files not present on the non-pruned backups will be deleted.
I’ve been following this discussion and I don’t find any reason why BB wouldn’t work fine on a system backup. There are no such things as data and system backup programs. It all depends on how you setup things. The so said system backup programs make life easier by having sane default settings for such, but there is no reason why a system backup can’t be made with a different program. Even cp can do it. Believe, me, I’ve transferred whole systems between partitions with cp more than once.
Totally agree with that! It is more important to focus on cold vs hot backup with “hot” being more prone to possible issues than “cold” I guess since files can possibly change during the backup process if one is careless. But many softwares (e.g Timeshift) which are primarily “designed” for system backup perform “hot” backups anyway so this just confuses newbies such as me.
Regarding my issue though, i don’t think that performing the Borg backup while the system is running caused the error as the permissions on /var/lib/lightdm in the Borg archive are correct. They actually get changed after the restore process, when rebooting from the LiveUSB back into the system. Can’t understand why in hell this happens although the fix is easy.
Sorry, I replied without reading the discussion below that post.
Of course it isn’t a good idea to backup while using the system, but note that “using the system” isn’t exactly the same as “while the system is running”. The only files which would not be properly backed up in the latter case are files being edited and not saved yet. Otherwise, when the backup program referencies a file, the updated version is returned by the OS, even if it wasn’t written to disk yet. Files dynamically changing at runtime are mostly (if not completely) excluded any way.