Unable to boot after update 19.5. 2023 - black screen freeze

Hi all,

I run kernel 5.15. Since last kernel update (which I did on 19.5.2023) I can’t boot to Manjaro.

Normally I get the screen followed by blinking cursor, that says something like:
/dev/mapper/luks-49.......1e1: clean 11....03/19...76 files 73..57/79..44 blocks

And then I get graphical login page:
But this time after the update the cursor freezes and Manjaro never boots.

I tried [Fix] System doesn’t boot, boots to a black screen, or stops at a message
But I can’t switch to other console and not even start Manjaro directly into runlevel 3 by modifying Grub.

Only thing that worked was to select initramfs fallback option in Grub.

However, then I recreated initramfs images and I lost even that option:
sudo mkinitcpio -P

So right now, the only option I have is to boot to older, unsuported Kernel 5.13. Which I obviously don’t want to use.

From 5.13 I tried to install other kernels (6.1; 5.10) but all of them have the same problem. With 6.1 I get only the black screen.

My inxi report:

I am using Manjaro Mate.

Thank you in advance for any suggestions.

Did you see this?

Hello megavolt, thank you for answer!

I used to have the same “no disk found error” in the past, but I never really paid much attention to it, because system automatically booted few seconds after without any further problems. Also, recently this error message ceased to appear (way before this error)

I checked the /boot/grub/grub.cfg as you suggested, but there are no dashes anywhere near cryptsetup.
Full config is here:
manjaro-grubcfg - Pastebin.com.

I have removed “quiet” parameter to find out that system always hangs after
Finished File System check on /dev/disk/by-uuid/12AE-EE30

Also it seems to me, that system is able to mount and decrypt encrypted partition, otherwise it wouldn’t get that far.

Btw. Strangely enough I managed to boot into 5.15 once. I did some editing in grub, booted. It failed and I got dropped into emergency shell. I restarted it, then it booted. When I restarted it again, I am back at the old problem.

@ripfruit after a quick look I saw this:

  ID-1: / raw-size: 303.49 GiB size: 297.66 GiB (98.08%)
    used: 268.17 GiB (90.1%) fs: ext4 dev: /dev/dm-0 maj-min: 254:0

Take that into account for ext4:

  • 1 % → file system, also called metadata
  • 5% → reserved space for root
  • 5-10% → ext4 journal

303.49 - 16% = 254.93

303.49 - 11% = 270.10

Take into account that encryption has also its extra space. So I conclude: Your root disk is just full and if you run lightdm as login-manager it won’t run if no space is available.

Thank you, but I don’t think that it is a problem in this case.
I can boot 5.13 kernel into the same GUI fast and without any problem. 5.15 could as well, until the last update.
Also, the system freezes even when it should boot only to runlevel 3.

Nevertheless, I freed some space, but the problem persists.

Can you share the full journal when booting another kernel than 5.13?

Sure, here it is, unsuccesfull 5.15:

Here is succesfull 5.13

I obtained it with
journalctl -b -1 and journalctl -b 0

Is it the log you need?

When boot freezes, I usually hard kill it. Now just thinking:
All previous boot logs ends with
kvě 25 22:46:55 automaton systemd-journald[339]: Time spent on flushing to /var/log/journal/9b6b597c07c04b33ac94870b748879b7 is 16.957ms for 959 entries.
(With different ms time)
So I wonder whether I shouldn’t let it just be. However, once last week I let it frozen for 30+ minutes and there was no change. Also the disk utilization is 2.6 GB according to
journalctl --disk-usage
And again, this doesn’t explain why it is not problem in 5.13

So, I’ve dug little bit more in the logs.
As I mentioned earlier, 5.15 once just started and then it didn’t again.
So I compared the logs from both and I noticed is fbcon discrepancy:

Successful boot:
kvě 25 22:32:12 automaton kernel: fbcon: Taking over console
kvě 25 22:32:16 automaton kernel: fbcon: i915drmfb (fb0) is primary device

Unsuccessful boot:
kvě 25 22:34:47 automaton kernel: fbcon: Deferring console take-over
kvě 25 22:34:47 automaton kernel: fbcon: Taking over console

Here are full logs:
Successful boot:

Unsuccessful one:

I am going to search more, but any ideas are welcomed.

Interesting that you trigger an Firmware-Bug on a sucssesfull boot:

kvě 25 22:32:12 automaton kernel: tpm_tis NTC0702:00: 2.0 TPM (device-id 0xFC, rev-id 1)
kvě 25 22:32:12 automaton kernel: NMI: IOCK error (debug interrupt?) for reason 70 on CPU 0.
kvě 25 22:32:12 automaton kernel: CPU: 0 PID: 440 Comm: (udev-worker) Tainted: G           OE     5.15.112-1-MANJARO #1 5658b43aa76baadb684cdf777f0d72594f1dd408
kvě 25 22:32:12 automaton kernel: Hardware name: LENOVO 20Q50025MC/20Q50025MC, BIOS R0ZET43W (1.21 ) 06/29/2020
kvě 25 22:32:12 automaton kernel: RIP: 0010:kstrdup+0x19/0x70
kvě 25 22:32:12 automaton kernel: Code: 5d 4b 07 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 48 85 ff 74 51 41 54 41 89 f4 55 53 48 89 fb e8 87 e5 31 00 <48> 8b 54 24 18 44 89 e6 48 8d 68 01 48 89 ef e8 73 91 07 00 48 85
kvě 25 22:32:12 automaton kernel: RSP: 0018:ffffbaf343eafa10 EFLAGS: 00000246
kvě 25 22:32:12 automaton kernel: RAX: 0000000000000019 RBX: ffff9a49d66a0020 RCX: 0000000000000000
kvě 25 22:32:12 automaton kernel: RDX: ffff9a49d66a0020 RSI: 0000000000000cc0 RDI: 0000000000000000
kvě 25 22:32:12 automaton kernel: RBP: 0000000000000013 R08: 0000000000000000 R09: 0000000000000000
kvě 25 22:32:12 automaton kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000cc0
kvě 25 22:32:12 automaton kernel: R13: 0000000000000000 R14: ffffffffc07e4128 R15: ffffbaf343eafa50
kvě 25 22:32:12 automaton kernel: FS:  00007ffb0d1c7080(0000) GS:ffff9a510e400000(0000) knlGS:0000000000000000
kvě 25 22:32:12 automaton kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kvě 25 22:32:12 automaton kernel: CR2: 0000557d2ea4ca58 CR3: 0000000111226003 CR4: 00000000003706f0
kvě 25 22:32:12 automaton kernel: Call Trace:
kvě 25 22:32:12 automaton kernel:  <TASK>
kvě 25 22:32:12 automaton kernel:  __kernfs_new_node+0x52/0x220
kvě 25 22:32:12 automaton kernel:  kernfs_new_node+0x31/0x70
kvě 25 22:32:12 automaton kernel:  __kernfs_create_file+0x25/0xe0
kvě 25 22:32:12 automaton kernel:  sysfs_add_file_mode_ns+0x89/0x190
kvě 25 22:32:12 automaton kernel:  internal_create_group+0x1ce/0x380
kvě 25 22:32:12 automaton kernel:  mod_sysfs_setup+0x4eb/0x700
kvě 25 22:32:12 automaton kernel:  load_module+0x27a8/0x2900
kvě 25 22:32:12 automaton kernel:  ? __do_sys_init_module+0x138/0x1c0
kvě 25 22:32:12 automaton kernel:  __do_sys_init_module+0x138/0x1c0
kvě 25 22:32:12 automaton kernel:  do_syscall_64+0x58/0x90
kvě 25 22:32:12 automaton kernel:  ? handle_mm_fault+0xd1/0x2c0
kvě 25 22:32:12 automaton kernel:  ? do_user_addr_fault+0x1e5/0x6a0
kvě 25 22:32:12 automaton kernel:  ? exc_page_fault+0x71/0x180
kvě 25 22:32:12 automaton kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
kvě 25 22:32:12 automaton kernel: RIP: 0033:0x7ffb0dcfceee
kvě 25 22:32:12 automaton kernel: Code: 48 8b 0d 85 ee 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 ee 0c 00 f7 d8 64 89 01 48
kvě 25 22:32:12 automaton kernel: RSP: 002b:00007fff97cee1b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
kvě 25 22:32:12 automaton kernel: RAX: ffffffffffffffda RBX: 0000557d2e841d80 RCX: 00007ffb0dcfceee
kvě 25 22:32:12 automaton kernel: RDX: 00007ffb0de47343 RSI: 0000000000003587 RDI: 0000557d2ea9e9d0
kvě 25 22:32:12 automaton kernel: RBP: 00007ffb0de47343 R08: 0000557d2e756750 R09: 00007ffb0ddccaa0
kvě 25 22:32:12 automaton kernel: R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
kvě 25 22:32:12 automaton kernel: R13: 0000557d2e836ea0 R14: 0000557d2e841d80 R15: 0000557d2e831ec0
kvě 25 22:32:12 automaton kernel:  </TASK>
kvě 25 22:32:12 automaton kernel: tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

The kstrdup function dynamically allocates memory in the kernel space and copies the contents of the given string into that allocated memory. So from that perspective, it looks like there is something wrong with your RAM. Faulty? :thinking: However, it looks like it is related to the TPM Firmware-Bug. Maybe disable TPM in your UEFI?

Thank you everyone for your effort. At the end I just gave up and reinstalled the system.

Apologies for late answer.