"Random" decryption failures of full disk encryption with Luks and grub at boot

Hi there :wave: !

I have weird issue for a week now regarding decryption failure on boot with Grub + Lucks

Initial context

  • I have installed Manjaro with full disk encryption with Lucks
  • The bootloader is the default one, grub
  • It has been working without issues since the beginning :blush:

Change introduced (main)

Everything was working fine for a long time (since May 2022 IIRC).

And now the problem

For an unknown reason (yet), when I am prompted for the encrypting key passphrase during boot by grub, it fails very often and eventually will work.

:memo: I may have made a typo once, but you can ensure that the following try, I type carefully :wink:

The error displayed by grub before entering in grub_rescue is something like: “Decryption failed, device ******* not found”


It is really annoying because,

  • sometimes (very few) it works as expected from the 1st time
  • sometimes I have to try a dozen of times before it “will” boot properly.

:memo: I am a complete newbie regarding disk encryption with Luks, so if someone would have the gentleness to lead me through documentation or troubleshooting things I should be doing, it will be very appreciated :pray:

Many many thanks for your help!

It would be good to have more complete error message.

As of now, this could mean what it seems to say:
the device that is to be available for decryption is not available at the moment when the system tries to access it.
That in turn could mean that it is, for some reason, slow to respond.
Which in turn could mean that the disk is faulty, beginning to fail.

All that is pretty much speculation.
Check your system logs for signs of read/write errors, check the “health” of your disk (smartctl).

… or it is simply a loose connection … something that can not only be triggered by mechanical forces, but also by temperature changes …

Hi @Nachlese ,

Thanks for your message :pray:

Grub

Here is the complete error displayed by Grub

Enter passphrase for hd0, gpt5 (fea4441871dd42fb8a1b76a1b3743bc7): Attempting to decrypt master key... error: access denied.
error: disk 'cryptouuid/fea4441871dd42fb8a1b76a1b3743bc7' not found. Entering rescue mode...
grub rescue>

System info

In the mean time, here are others information regarding my system, if any body else faced the same issue with the same hardware :person_shrugging:

OS: Manjaro Linux x86_64 
Host: 82NC Yoga Slim 7 Pro 14IHU5 
Kernel: 6.1.1-1-MANJARO 
CPU: 11th Gen Intel i7-11370H (8) @ 4.800GHz 
GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics] 
Memory: 7679MiB / 15782MiB

Smartcl info

smartctl --all /dev/nvme0
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.1-1-MANJARO] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDC PC SN730 SDBPNTY-1T00-1101
Serial Number:                      21112Z801574
Firmware Version:                   11170001
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1 024 209 543 168 [1,02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8215
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1 024 209 543 168 [1,02 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b48aa83c8
Local Time is:                      Fri Jan 20 11:39:57 2023 CET
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.00W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.1000W       -        -    3  3  3  3     4000   10000
 4 -   0.0035W       -        -    4  4  4  4     4000   40000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    2 756 551 [1,41 TB]
Data Units Written:                 3 473 609 [1,77 TB]
Host Read Commands:                 33 209 227
Host Write Commands:                72 415 276
Controller Busy Time:               74
Power Cycles:                       201
Power On Hours:                     65
Unsafe Shutdowns:                   83
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Apparently, at first sight, there are no issue :thinking:

Thanks again :pray:

I may be totally off here, but this name, the name of the disk that is to be decrypted and can’t be accessed,
seems … wierd to me.

To further analyze this
can you post the output of:
grep -E 'GRUB_CMDLINE' /etc/default/grub
(this will filter the contents of the pretty long file and only show the few lines relevant here)

also, the output of:
lsblk -f
as well as the contents of your /etc/fstab
cat /etc/fstab
could shed more light on the matter

rather complete System info is obtained by:
inxi -Faz
which will be largely redundant after the above commands, but could still be useful

Hi @Nachlese ,

After several tests, I have found the cause of the issue … Layer 8 aka PEBCAK (almost) :sweat_smile:

With all that remote work now, I am working from home on my laptop connected to a remote display…

  • :x: When the display is connected, I have this “random” issue (because sometimes it works).
  • :heavy_check_mark: When the display is not connected, it always works.

→ So I have to disconnect my laptop from the display when resuming from hibernation or booting.
Later on, no issue to report.

Sorry for the noise :confused: and thank you for giving time to my case :pray:

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.