[Pinebook Pro] 5.13 no longer boots from NVME

Without noticing the availability of this update, I installed xfburn, which resulted in the update being installed (arm-stable-update-2021-07-06-kde-plasma-firefox-quartz64-support-and-kernels).

Reboot (two attempts–second using power button) resulted in dark screen and no signs of reboot.

As Strit is aware from my recent post, I recently migrated my Manjaro system to NVMe (boot and /) with labels BOOT_MNJRO and ROOT_MNJRO applied to the NVMe SSD partitions. Old Manjaro system is still on eMMC and partitions were relabeled MMCBT_MNJRO and EMMC_ROOT_MNJRO.

I booted Armbian Focal from an SD card for debug and I mounted the new and old boot partitions in Armbian to compare contents.

initramfs-linux.img was updated (on the NVMe BOOT_MNJRO partition, as expected).

I’ll work on this later today, but I’m not sure of the best and safest next steps. I could of course recover to my old Manjaro system currently archived on my eMMC.

Okay–I’ve confirmed that this update completely breaks my Manjaro system on NVMe SSD. I recovered from my first update by re-imaging my old Manjaro system from my eMMC to my NVMe SSD (full device image, followed by partition resizing and re-labelling). My system then successfully booted again from NVMe (both / and /boot on NVMe active).

Previously, I tried just restoring the the files on /boot from the eMMC boot partition, but it still would not boot.

Does this update touch the UBoot section before the boot partition? None of the files in /boot have changed.

Guess I’ll have to re-image with my backup again and then not upgrade until I can get some guidance from devs here.

Nope. There should be no uboot update in this update. Kernel got updated though.

I would think my first attempt at restoring all of /boot from backup would have booted then. Maybe a driver?

I edited /etc/pacman.conf to ignore kernel updates on my PineBook Pro to try to get this update to run and then still boot from NVMe.

(I don’t know if there are other packages that should be added to this IgnorePkg list):

# Pacman won't upgrade packages listed in IgnorePkg and members of IgnoreGroup
IgnorePkg   = linux
IgnorePkg   = linux-headers
IgnorePkg   = linux-firmware
#IgnoreGroup =

Unfortunately, now I get a core.db error, because the mirrors are apparently down. (I’ve tried a couple of times.)

 sudo pacman -Syu
:: Synchronizing package databases...
 core is up to date
 extra is up to date
 community is up to date
error: failed retrieving file 'core.db' from mirror.telepoint.bg : Resolving timed out after 10000 milliseconds
error: failed retrieving file 'core.db' from mirrors.ocf.berkeley.edu : Resolving timed out after 10000 milliseconds
:: Some packages should be upgraded first...
resolving dependencies...
looking for conflicting packages...

Packages (1) manjaro-keyring-20210622-1

Total Download Size:   0.12 MiB
Total Installed Size:  0.17 MiB
Net Upgrade Size:      0.00 MiB

:: Proceed with installation? [Y/n]

Update: Hmm…on my third try, I only saw one mirror resolution error. Perhaps there are more than one or two mirrors and it can proceed (but then why not post a “warning” rather than an “error”)?

Anyway, I’m proceeding with the installation. I’ll post an update with the result.

It boots! :joy:

It seems that one of the linux packages (linux or linux-firmware) that I blocked probably kills MVNe on the Pinebook Pro.

From updated and working system after IgnorePkg list:

$ uname -a Linux pinebook 5.12.11-1-MANJARO-ARM #1 SMP Wed Jun 16 10:48:53 UTC 2021 aarch64 GNU/Linux

Well there’s no need for me to change to another branch to help. For now, I’d like to help solve this stable branch problem! :nerd_face:

How should I proceed? Perhaps the next step would be to install and boot manjaro stable on an SD card, update to 2021-07-06 and see if the update breaks NVMe even when it’s not the boot drive. If NVMe breaks, hopefully I’d still boot and have debug resources at my disposal.

Should I file a bug report now? Where?

What device are you on? AFAIK on a Pinebook Pro the boot partition has to stay on the eMMC as the mainline uboot of it doesn’t support booting from nvme directly… I run my Pinebook Pro with nvme for more than a year without any issues (beside sleep mode not working, which is a known limitation right now).

Still have my boot partition (BOOT_MNJRO) on eMMC and the rest of / (ROOT_MNJRO) on nvme though.

Last clean install was 20.06 if I remember correctly and after that I just went with any testing or stable update published (testing when I want to test some new packages).

I would try updating linux-firmware and see if it still works after that. If it does, then it’s a kernel issue, and we would have to see why it’s different in 5.13.

If it does not, then it means a firmware file (probably for your nvme drive) changed.

Great! That’s what I’ll try next.

Yeah–that’s what I thought too when I first installed my NVMe SSD in my PBP, but it turns out that only the Uboot code, which precedes the boot partition on the eMMC disk, must remain intact, but /boot and root partitions can be moved to the NVMe disk. It took me awhile to sort all this out. Someday, I might even attempt to flash my SIP and remove the necessity of retaining the Uboot code on my eMMC, but manjaro supports the functionality of the patches discussed here under Using as OS root drive

Here’s a conversation I had with Strit while I was learning:

Here are a few more links on the subject. The only compelling reason to risk flashing SIP is to be able to fully remove the eMMC, which saves power but I could not make a reasonable estimate of the savings from the eMMC datasheet. After this update failing on my system, I think I’ll continue to use eMMC to at least maintain a /root and boot backup!

I think this is pcm720’s latest code. I did successfully build it myself, but my native build was a bit smaller than pcm720’s cross compiled build (as such, I don’t know if I should trust my build to function in SIP) and, given that Manjaro provides and installs a UBoot that works well with NVMe (does everything Uboot can currently do, short of risking flashing it to SIP), I have never installed any of these builds. You’ll find PCM720’s earlier and probably outdated code on github too, if you follow various links:

According to my testing, the Uboot code installed by Manjaro contains the functionality of Nadia’s patch (including warm reset):

I removed the IgnorePkg = linux-firmware line and even rebooted but pacman says I’m up to date.

$ sudo pacman -S linux-firmware
warning: linux-firmware-20210511.7685cf4-1 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...

Packages (1) linux-firmware-20210511.7685cf4-1

Total Installed Size:  678.77 MiB
Net Upgrade Size:        0.00 MiB

:: Proceed with installation? [Y/n] 

I don’t find linux-firmware in the list of updated packages either (copied to a file from announcement above in this thread and searched):

$ grep linux- update_pkgs
                           linux-aml             5.12.0-1           5.13.rc5-1
                   linux-aml-headers             5.12.0-1           5.13.rc5-1
                       linux-headers            5.12.11-1             5.13.0-1
                     linux-pinephone            5.12.11-1            5.12.13-1
             linux-pinephone-headers            5.12.11-1            5.12.13-1
                      linux-quartz64           5.12.0-0.2           5.13.0-0.4
              linux-quartz64-headers           5.12.0-0.2           5.13.0-0.4
                            linux-rc           5.13.rc6-1           5.13.rc7-1
                    linux-rc-headers           5.13.rc6-1           5.13.rc7-1
                          linux-rpi4            5.10.43-1            5.10.46-1
                  linux-rpi4-headers            5.10.43-1            5.10.46-1
                           linux-vim             5.12.9-1             5.13.0-1
                   linux-vim-headers             5.12.9-1             5.13.0-1
                           linux-aml                    -           5.13.0-0.5
                   linux-aml-headers                    -           5.13.0-0.5

I saved a tar archive of the updated /boot after the failed update but accidentally deleted it. I recall that there were 7 files in /boot/dtbs though. Now and prior to the update it’s only 5 files.

Is there another package or a dependancy on the linux package?

Except for the kernel (“linux”) package, pacman says I’m up to date (but I’m a newbie to pacman so maybe I’ve done something wrong).

$ sudo pacman -Syu
:: Synchronizing package databases...
 core is up to date
 extra is up to date
 community is up to date
:: Starting full system upgrade...
warning: libtool: local (2.4.6+44+gb9b44533-14) is newer than core (2.4.6+42+gb88cebd5-15)
warning: linux: ignoring package upgrade (5.12.11-1 => 5.13.0-1)
 there is nothing to do

So, if pacman is correct, it looks like it’s the upgrade (5.12.11-1 => 5.13.0-1) that caused the system to fail to boot.

Very likely. I’ll see if something changed in the kernel config regarding nvme or pcie stuff.

Hm, there was 1 change related to pci in 5.13 config:

- CONFIG_PCI_MSI_ARCH_FALLBACKS=y

This line was removed.

So I dug a little in the pci_msi commits and found this one, which basically removes the msi_controller entirely, stating that no drivers use it.

So it looks to be an upstream code change and not a change in our kernel config. :frowning:

Thanks for figuring it out, Strit.

So the assertion, “As there is no driver using msi_controller, we can now safely
remove its use from the PCI probe code.” is incorrect? That really sucks! This NVMe adapter and SSD are current products and not some kind of legacy hardware.

I guess I could get myself setup to build kernels in Manjaro and see if I can put it back in and make it work. I used to build my own kernels all the time and even mostly ran Gentoo on my personal systems for many years.

I assume I’d need to put the CONFIG back in and also the code deleted from the three files here:

Should I file a bug report on gitlab? I assume linux.org has moved bug reporting to gitlab since the days that I was building kernels.

The gitlab repo is just a mirror (which I use for easy searching).

Bugs are still in bugzilla.kernel.org.

But yeah, find out if reverting that actually fixes it, before you report it.

Yes–fixing it would pretty much cinch the root cause. I’ll try it.

In the meantime, I downloaded Manjaro XFCE and flashed it to SD. Then I updated to 5.13. The NVMe SSD still works when booting from SD so the PCI probe code for the msi_controller must only be necessary for boot. After booting from another device, I guess a kernel module gets loaded so NVMe still works.

So even a PBP with Uboot flashed to SIP might also fail to boot from NVMe? I’ll have to study the boot flow on the PBP wiki again when I’m more awake.

I’m trying to setup to build a kernel with the code restored. Guided by your posts here:

I used git and updpkgsums to obtain linux513 and the kernel source but I’m troubled that I can’t find the
“CONFIG_PCI_MSI_ARCH_FALLBACKS=y” line in any of the previous config files.

My plan is to restore that line in the linux513 config file and also restore the code in the 3 files documented here:

and then build a new kernel package.

Okay–I found it. It’s in zcat /proc/config.gz

so, per your instructions,

 zcat /proc/config.gz >> .config

brings it into my Linux513 kernel source from gitlab.

But I’m still running 5.12 so where can I find the 5.13 config file? Again, my plan is to take the Manjaro 5.13 source and edit it to restore the use of the msi_controller. If it then boots from NVMe, that would be conclusive and I’ll file the bug report.

I have Manjaro XFCE on an SD card. Maybe I can get the 5.13 config file from it, but looks like I’ll have to boot the SD card to generate the file in /proc/

make menuconfig doesn’t allow me to add the "CONFIG_PCI_MSI_ARCH_FALLBACKS=y” back into the 5.13 .config file so I added it back in manually. Now on to the source edits…

When I tried to edit linux-5.3/drivers/pci/msi.c to put the code back into 5.13, I see that it still contains the old code with the MSI support ( beginning with /* Arch hooks */`on line #65), but when I browse the code at msi.c « pci « drivers - kernel/git/stable/linux.git - Linux kernel stable tree, I see that the MSI code has been removed, as expected.

The code differences are not as I expect so pointless to build the kernel at this point.

When I

cd linux53
updpkgsums

(quote is from your “Download kernel source” post)

Does updpkgsums generate a tar.gz source archive for 5.12 instead of 5.13, because I’m still running the 5.12 kernel? I guess I could try all this from my 5.13 Manjaro XFCE SD card and see what happens.

This kernel building process is very different from my frequent experiences building kernels with Gentoo, Ubuntu, Red Hat, etc. many years ago, where I’d just install the linux-source package (or whatever it was named) and I’d know exactly what version of linux source I was getting!

You need this source instead:

The other one you have is pretty old and for x86_64.

Oops! I’ll git it and start over. :slight_smile:
Thanks!

I’m following this guide

and I’m at Step 9:

However, there is no /boot/vmlinuz file to copy and vmlinux-5.13.0-MANJARO-ARM already exists and it has a timestamp that indicates it was built when I performed step 7 (“sudo make install”).

# ls -tla
total 82408
-rwxr-xr-x  1 root root  6216201 Jul 23 19:03 System.map-5.13.0-MANJARO-ARM
-rwxr-xr-x  1 root root 31402496 Jul 23 19:03 vmlinux-5.13.0-MANJARO-ARM
-rwxr-xr-x  1 root root  7208095 Jul  7 09:40 initramfs-linux.img
-rwxr-xr-x  1 root root 30974464 Jun 16 03:41 Image
drwxr-xr-x  2 root root     4096 Jun  3 19:11 extlinux
drwxr-xr-x  7 root root     4096 Jun  3 19:10 dtbs
-rwxr-xr-x  1 root root   157980 May 23 10:04 idbloader.img
-rwxr-xr-x  1 root root  4194304 May 23 10:04 trust.img
-rwxr-xr-x  1 root root  4194304 May 23 10:04 uboot.img
drwxr-xr-x 17 root root     4096 May  5 15:01 ..
drwxr-xr-x  4 root root    16384 Dec 31  1969 .

Should I proceed with step 10 to generate initramfs now without doing step 9??

I’ll wait for confirmation from you, Strit, because of the risk that my system might not boot if I proceed incorrectly.