I installed using the live-installer with defaut proposed luks encryption configuration & swap for hibernate:
unencrypted ESP : vfat
ROOT (easier to read than the UUID) partition : luks encrypted container with ext4 inside
SWAP (easier to read than the UUID) partition : luks encrypted container with swap inside
After a few days I noticed that ROOT ext4 fs was getting corrupted systematically after resuming from hibernation.
kernel: EXT4-fs (dm-0): Delayed block allocation failed for inode <inode number> at logical offset <offset value> with max blocks 3 with error 117
kernel: EXT4-fs (dm-0): This should not happen!! Data will be lost
So I started digging and found thanks to manjaro and arch forums the following kernel warning:
* BIG FAT WARNING *********************************************************
*
* If you touch anything on disk between suspend and resume...
* ...kiss your data goodbye.
*
* If you do resume from initrd after your filesystems are mounted...
* ...bye bye root partition.
* [this is actually same case as above]
*
This was very interesting because because manjaro’s live-installer default installation does exactly what the warning says not to do with the contents of the encrypt, openswap & resume scripts
So he is what happens on boot:
Grub asks and gets the password, it does its own decryption and loads the kernel and initramfs
initramfs contains /crypto_keyfile.bin as per the default /etc/mkinitcpio.conf FILES="/crypto_keyfile.bin"
This keyfile can be used to decrypt both ROOT and SWAP per the live installer choices.
encrypt hook decrypts ROOT to luks-ROOT using the default ckeyfile=/crypto_keyfile.conf provided in the / mounted initramfs but then erases it in its last lines [ Why do that ???] rm -f ${ckeyfile}
openswap hook mounts luks-ROOT to a tmp directory, gets the keyfile and decrypts the SWAP to luks-SWAP using default variables from /etc/openswap.conf
So root is getting mounted before resume… just to acces a keyfile that is avaailable in the initramfs (but that the encrypt hook erases…
Any way here are the 3 changes what I did to get rid of the root filesystem corruptions:
comment the rm -f ${ckeyfile} line in /usr/lib/initcpio/encrypt
I don’t understand why that line is there… ? (maybe it is useful for erasing other keyfiles thatn the default “/crypto_keyfile.bin”
set keyfile_device=/ and keyfile_device_mount_options="--bind" in /etc/openswap.conf
That way I’m using what is in the initramfs and not mounting the luks-ROOT ext4 filesystem before resume.
Yes I understood the warning the same way you do…
But the reality is:
that just the mounting seems to causes corruptions as my 3 simple changes have eradicated the corruptions haven’t had any since I implemented it (last 2 days with multiple hibernates to test it…).
that others have switched to swap files on ROOT to solve the corruption thereby removing openswap and its mounting…
The answer I’m really interested in is: why the `rm -f ${ckeyfile}’ in encrypt hook.
When the “default” openswap configuration mounts luks-ROOT it does so with no mount options which means that luks-ROOT is actually mounted with “options=defaults” which contains relatime whereas Manjaro uses noatime…
Could [in certain conditions] atime modification of the keyfile happen on access thereby modifying le fs and causing the corruption ?
If yes another single step fix could be just adding keyfile_device_mount_options="-o noatime" to the default /etc/openswap.conf
I have created several encrypted installations using Calamares with hibernate / using swapfile (not partition) - I have not experienced any issues on subsequent use.
edit: just realized that it was using swap partition - nonetheless - no issues yet.
There was a kernel bug causing ext4 corruption earlier this month. It has been fixed. That may have been the cause of your disk corruption and not the way LUKS is configured on Manjaro.
Adding -o noatime did not help… Indeed after first resume from hibernante luks-ROOT ext4 corruption was so bad I could not write to my home directory !
Switched back to my fix that avoids mounting luks-ROOT before resume hook, ran fsck and then restored my home directory (as my kdeconfig had been borked…)
All is well now… No more testing I rest my case with all the testing done.
No. It’s not same bug. This is problem only with luks and separate swap partition, we must mount ext4 before resume for keyfile. I’m affected too. This problem has first mention in 2014 on arch linux. @linux-aarhus
You may not have any problems, but this has been happening since at least 2015: [solved] Issue with file system after hibernation / Newbie Corner / Arch Linux Forums
I have the exact same situation, resulting in hibernation not being possible on my laptop.
Are you kidding me? Firstly, this use case will also damage file systems such as xfs and any other journaling file system. And in the case of btrfs it will most likely lead to data loss. Secondly, this is an error in the distribution package - and, in fact, a fix for the error has now been chewed up in the topic. Are you suggesting that I change the distribution because I need hibernation? Okay, I heard.
And last. This is DEFAULT setup with DEFAULT partitioning.
Seems we should check this issue more. Since there is a potential solution available we should check if that is correct and if other systems are affected by this and simply dont know or had just luck.
Thwre are many products and software involved and we should avoid any of them to start a blaming game.
We need a fixed ISO, a way to reproduce and a way to check if any changes may break other systems not affected by this.
Ok. I’w used plasma ISO. On clean install Manjaro in installer on step 3 (Partitions) chose “Erase disk”, Than choose “Swap (with hibernate)” an you can leave default fs ext4. Check checkbox “Encrypt system”, enter password. After proceed the install as usual.
After install we can do hibrenate, and after resume we got corruption of root filesystem. In case of ext4 we can just reboot, it’l be repaired by fsck during boot.
Problem persists because in encrypt hook (cryptsetup package) in last stage we removing any keys from initramfs. I beieve, that it for security reasons, but I don’t see any of them. Thus, as we haven’t keys for open the swap partiton, as I see, openwap mounts root partition in readonly for reading key from it and activate swap for resume. BUT it’s journaling system and journals is modified in read-only mount. BTRFS and any filesystem with metadata checksumming also corrupts with dmesg logs. Any journalling fileystem without metadata checksums corrupts silently. A workaround we must change config of openswap and comment last line in enctypt hook.
As I see, there no security advantages for remove keys from initramfs (because if attacker cat decrypt initramfs, system fully compromised). I think, bug must be fixed by apply this workaround by default. Such setup will not damage any other setup, new or existing systems. Now we lose our modifications with any future updates of cryptsetup package.
I’ll not report rhis to arch linux, because registering with their gitlab is too complicated (you need to write a letter, in which I have not yet received a response), and English is not my native language. I believe that a distro that claims to be user friendly can accept a bug and fix it without having to go through such idiotically complicated ways to report the problem and its solution.