After migration to new NVMe: System does only boot from new NVMe, if old SSD is still connected

OMG :woman_facepalming:
OFCOURSE thats the problem, you canā€™t have duplicate UUIDā€™s in a systemā€¦
They are per definition meant to be UNIQUE :stuck_out_tongue:

Thatā€™s what I thought :sweat_smile:. UUIDs should be unique.

But no worry, there is a way to change the UUID of a partition, i just canā€™t come-up with it at moment out of headā€¦
Anyhow after you change the PARTUUID of that 2nd partition you want, you need to modify your configs in the UEFI-Menu entry etc to reflect thatā€¦
Then it will boot from it no matter if the other drives are connected or notā€¦

Hint: Never ever duplicate a drive without changing ALL UUIDā€™s on it, like Disk, partition etc, if you want to use it in same system as the originalsā€¦

:vulcan_salute:

Unfortunately, I just recognized that the PARTUUIDs are identical. Thank you for helping me to get to this point. And now that we speak of it, I think I remember that I had to change another UUID (I think it was 352D-5870 to F8AF-BD3B), in order to get the system to boot at all.

I thought it was possible with GParted, but maybe that only changes the short UUID or something.

But Iā€™m not 100% confident, that this will fix the issue, because when I remove sdb, the UUID is unique again, right? And then the system does not boot. So why should it boot after having changed the PARTUUID?

Lets say that the system actually DOES boot, it just doesnā€™t startup, because your problem is further down the rabbit hole, most likely having todo with Grub, but not your UEFI-BIOS that canā€™t find your bootloaderā€¦

PS:
Itā€™s actually also VERY weird that you dont have a \EFI\BOOT\... directory on that ESP, but i donā€™t think that really matters with the current situationā€¦

I understand. Being precise about the word ā€œbootingā€ is crucial in this case. :smile:

But still, Iā€™m a little worried that once I change the PARTUUID, I no longer can start up my system, but maybe things will resolve on their ownā€¦

Doing some googling and consulting ChatGPT, I seem to understand that the plan of action should be something like the following:

  1. Chroot.
  2. Generate a new UUID with uuidgen.
  3. Assign the new PARTUUID: sudo tune2fs /dev/nvme1n1p2 -U new_partuuid
  4. Update /etc/fstab. I think I can skip this, because my fstab uses a different UUID: UUID=F8AF-BD3B /boot/efi vfat umask=0077 0 2

Iā€™m not certain what to do next, but maybe something along the lines of:

  1. Reboot.
  2. Chroot.
  3. Check if the correct boot partition is mounted.
  4. Update GRUB: sudo update-grub

@TriMoon What do you think?

:point_right: How to change PARTUUID?

See the link above, its way easier as what that dumb AI gave you as answer to the changing UUIDā€¦

The sequence i personally would follow is:

  1. Boot using a LiveMedia.
  2. Change the GPT tableā€™s UUID plus partition UUIDā€™s on the cloned disk.
    Yes all of them inclusive the GPT tableā€™s UUIDā€¦
  3. Mount the changed partition somewhere unique, lets say /efi as the specs actually tell you :wink:
    systemd-mount /dev/disk/by-partuuid/<THE_NEW_PARTUUID> /efi
    
  4. Create a new UEFI-Boot entry using efibootmgr for the new ESP partition with itā€™s new UUID.
    (This should/will look almost identical to the one you already have, but with a diferent PARTUUID, so itā€™s best to give it a ā€œlabelā€ that includes the drive model/name/whateverā€¦)
  5. Check/change your fstab and grub configs as needed, inside the cloned diskā€™s Linux root partition.
    (could use chroot but can do withoutā€¦just mount it somewhere also for this point.)
  6. Recreate your initrd and grub configs inside a chroot, maybe even re-install grub into the new ESP.
    (Iā€™m not sure about the grub part as i donā€™t use that, i use systemd-boot instead)
  7. unmount the new ESP partition we mounted earlier.
    systemd-umount /efi
    
  8. sync
  9. Cross fingers and reboot :wink:

If you get errors, double check your steps and check logsā€¦

Thatā€™s using the UUID of the filesystem, not the PARTUUID :wink:
Which in your case is dangerous also, because you cloned, thus the filesystem UUIDā€™s are still sameā€¦

@Nos , in case you were already performing stuff i posted, might recheck as i have edited a lot :rofl:

Thanks for the hint, I actually already changed the PARTUUID (fortunately, without breaking things :joy:) but then ran out of time. Iā€™ll continue tomorrow and keep you updated. But already a ton of thanks! Your guidance, and even more so patience, was tremendously helpful. :+1:

Ok, unfortunately, Iā€™m in a hurry, but I thought a small update wonā€™t hurt:

  1. I changed the PARTUUID and PTUUID of the EFI partition on /dev/nvme1n1. I didnā€™t have to change the UUID, since I had changed that after the cloning.
  2. I added a new boot entry using manjaro-chroot and grub-install. I called the entry manjaro_new.
  3. Still in chroot, I called mkconfig, which did not give me any errors, but did not create a grub.cfg in /boot/grub/. Is that to be expected?
  4. Reboot.

There is a boot entry manjaro_new in the UEFI boot selection, which brings me to Grub. Now once again the weird part: with the old SSD attached, the start up works, but without it attached, I just reach the screen that I posted above (something something rfkill).

That is so strange to me. What does it matter, if this drive is connected, if it is not being booted? How is it used during startup.

Then it hit me, that the UUIDs of the root partition have not been changed:

lsblk -o PARTTYPE,PARTTYPENAME,UUID,PARTUUID,PTUUID /dev/nvme1n1
PARTTYPE                             PARTTYPENAME     UUID                                 PARTUUID                             PTUUID
                                                                                                                                49c4553e-ebe4-40c5-b300-bc47c5924055
0fc63daf-8483-4772-8e79-3d69d8477de4 Linux filesystem 9c57c860-5b10-48bb-b317-03bed2a111b7 dd3b6169-4973-4e13-8816-493ba94e9ca0 49c4553e-ebe4-40c5-b300-bc47c5924055
c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI System       F8AF-BD3B                            0b1df12d-cfaa-4e1c-bbc8-682f70739821 49c4553e-ebe4-40c5-b300-bc47c5924055
lsblk -o PARTTYPE,PARTTYPENAME,UUID,PARTUUID,PTUUID /dev/sdb
PARTTYPE                             PARTTYPENAME     UUID                                 PARTUUID                             PTUUID
                                                                                                                                35d1831e-3ad6-4d67-a757-4f40f3573ebc
0fc63daf-8483-4772-8e79-3d69d8477de4 Linux filesystem 36199365-e014-4347-9d8a-536d7c560684 dd3b6169-4973-4e13-8816-493ba94e9ca0 35d1831e-3ad6-4d67-a757-4f40f3573ebc
c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI System       352D-5870                            9c967a2c-739f-47fb-aeb1-ff618a6e73e6 35d1831e-3ad6-4d67-a757-4f40f3573ebc

May that be the issue? Or actually: may that be the reason that the system starts up, while it cannot start up with just /dev/nvme1n1?

I also suspected that the whole time Iā€™m operating in my old system, but I checked and itā€™s not the case:

sudo df -a
[sudo] password for *******:
Filesystem      1K-blocks       Used  Available Use% Mounted on
...
/dev/nvme1n1p1 1919844040  648659740 1173615344  36% /
...

So you basically figured it out. :clap: And you firmware and grub install works as expected.
I think i would change the last identical uuid. It is obviously still trying to load smth from the old drive at some point, maybe this is the reason. Not sure but is definitely worth trying. All these are really meant to be unique. You saw how confused your EFI was, later in the boot process there is a lot of other stuff that can be confused too.

TBH, Iā€™m kicking the can a little on changing the UUIDs of the root partition, because Iā€™m afraid that this might lead to a system that wonā€™t start up, no matter what drives are connected, whereas now, I can just disconnect the old drive, to have unique UUIDs or re-connect it, to have a system that starts up.

So Iā€™d like to understand the potential benefits of changing the remaining UUIDs of the root drive a little better, before going through with it. My main question: Is there a technical difference, between changing the UUIDs and removing the old drive? Because either way: the UUIDs are unique in the system afterward, correct?

I agree. But isnā€™t a different way to say it: It apparently does load something from the old drive and can therefore start up? So isnā€™t it just a matter of how to figure out what is missing and re-configuring/re-installing it (with the old drive disconnected)? And wonā€™t I have to do that anyhow, if I changed the remaining UUIDs? So what is the potential benefit of changing the UUIDs compared to just removing the drive and re-installing/re-configuring the missing piece?

Iā€™m very aware that I might lack a fundamental understanding of what is going on, so please be kind, if that is the case. :smile:

The risk to break things even more while fixing them is there, unfortunately.

1 Like

The first entry is sdc2, and the last is of course sdc3. The last should be changed to use persistent naming.

So unless itā€™s changed since you posted that, it seems like you need sdc to be connected, or you can comment out those lines.

In principle, thatā€™s an acceptable risk to me. But, Iā€™m wondering: Is there a benefit, compared to just disconnecting the old drive?

@dmt Thanks for pointing that out. Iā€™ll fix that.

@Nos , because this thread has become so long with so many changes in between:
Can you provide some logs of what is NOT WORKING when you boot without the old drive connected?

Because without error logs, we just keep thinking in thin airā€¦
We need solid proof of what is happeningā€¦

  • In text form, not photoā€™s, you can redirect the output of the logs to a file if needed so you can paste it here laterā€¦
    • Use tools such as dmesg and journalctlā€¦

Also keep in mind that when you connect/disconnect drives, they will get different device names, thatā€™s why the usage of UUIDā€™s are so important.
As an extra FYI: In special cases you might get away with using volume LABELā€™s alsoā€¦

Examples:

  • NVMe 1 and 2 connected can provide:
    1. /dev/nvme0n1p1 = First partition on first drive on first controller.
    2. /dev/nvme0n2p1 = First partition on second drive on first controller.

Now when you disconnect (1) above, then (2) will become (1) :wink:
Same logic for drives that get /dev/sda, /dev/sdb etcā€¦

Sorry for the long wait, but since I need my pc for work, I couldnā€™t risk breaking something during the week. :sweat_smile:

So I did that, did a reboot (with the old drive connected) and everything worked.

I read up on journalctl and how to provide dmesg logs that are older than the current boot (see here), powered my system off, disconnected the old drive and was prepared to get some good logs from the failed start-up. So I booted, and lo and behold, the system just started up, without the old drive. Problem solved. Just as that.

And with hindsight, it makes perfect sense: the old drive was assigned sdb and the backup/data HDD sdc, which was used in the fstab. So, when disconnecting the old drive, the backup/data HDD was assigned to sdb and there was no way for the system to mount sdc. :person_facepalming: So easy! The only thing Iā€™m wondering: why didnā€™t the system inform me about this problem at start-up? Something like: Hey! You want me to mount sdc, but there is no sdc! Are you serious?

Anyway, thank you people so much! you have been a huge help and put everyone to shame that claims, Linux people wouldnā€™t be supportive to noobs like me. Thank. You.

1 Like

Linux is an obeying slave, it does what it is told to (unlike that other OS), and will keep trying to do so if it canā€™t, UNLESS it is told to give feedback about it :wink:

The only people who claim such things no matter if its directed at one or the other group, are those that are like that themself.
Everyone who started with computers was a noob at start, and some stay it longer as others :wink:

Youā€™re welcome :smiley_cat:

Normally youā€™d get A start job is running for... with a timeout (default 90s). A quiet boot hides the message until the timeout is reached.

Once itā€™s timed out youā€™re given the option to log into root and make changes, such as fixing fstab.

Not sure why you wouldnā€™t see it with a normal boot, even if other output pushed it off the screen it should show up again a few seconds later.

And this is why you use UUID in fstab instead of /dev/xxx

You will never be sure that the drive will stay sdx.
They are usually created in the order of size, so if you have f.ex a usb-stick connected to your usb and reboot, that might take that spot.

Please use UUID in fstab (or systmd mount, you were looking in journalctl, systmd mounts are awesome for that)