Aragorn, thanks very much for your replies and advice around partial updates. Have also appreciated reading your support responses elsewhere in the forum. 
I donāt think the issue is related to faulty hardware. After further testing, i suspect an incompatibility with something introduced in recent kernels (Iāll try to update the title as such), and think it seems unlikely a HW fault, given:
-
The system ran well under sustained use and only exhibited faults after upgrading to 23.1 and itās bundled kernel upgrades, meanwhile, the Windows 10 dual boot on the system continues to run without issue.
-
The system reliably exhibits the fault under multiple new installs of either 23.1 running the newer kernels they come bundled with, and also on 23.0 but only after upgrading to most of the latest kernels.
-
The system continues to reliably function under reinstalls of 23.0 on the 6.5.5 kernel. 23.0.4 also works if installing the current 5.10 kernel, and also earlier releases, eg 22.0.5 with the older 6.1 kernel itās bundled with.
-
re SSD: the results are replicated whether booting from a fresh Manjaro install on the SSD including replacing the Manjaro partition, and the faults also occur identically when booting from multiple different liveCD USB sticks without using the system SSD at all. Windows continues to run fine from the same SSD. Smart tests report good across the board for the Samsung PCI-e3 NVMe drive, no media errors, no critical warnings.
-
re RAM: 2x Samsung 8GB DDR4-2666 SODIMMs from 2019 in dual channel running at up to 2666MHz on standard timing CAS 19-19-19-43 completes a full pass of Memtest86+ without error, and also continues to run the Windows 10 dual boot and any re-installs of 23.0 Manjaro without any error, under considerable daily usage.
FYI slight correction to PC stats above.
- Core i5 8365U Whiskey Lake CPU on Intel Coffee Lake chipset mainboard.
- Latest Lenovo firmware 1.81 from Oct 2023
I did further testing to try and isolate the problem. I re-installed 23.0.4. This worked fine on itās bundled 6.5.5-1 kernel yet again. I did not upgrade any general Manjaro packages but just installed additional kernels via MSM.
Booting into 6.6.8-2 on 23.0.4 was very slow, and the network adapters were broken again once the system did load. Rebooting on the 6.1.69-1 kernel had the same problem - slow boot, errors present in logs, no network access, sudo commands would freeze but they would work under su
. It also hangs when shutting down. Other current kernels i tried on this 23.0.4 install also failed: 6.7.0rc7-2, 5.15.145-1. Rebooting back on 6.5.5-1 completely restored proper system function each time.
Booting on kernel 5.10.205-1 did work properly! For the record, many previous 6.x kernels had also worked over the last 12 months or so of regular updates.
On most of these boots i opened a terminal and under su
ran journalctl --priority=3 --catalog --no-pager
, dmesg
, and cat /var/log/boot.log
On the problematic boots, several issues reported, which appeared to be the same each time.
On the boots into a functional environment (23.0.4 on kernels 6.5.5 and 5.10.205) most of the following errors were not present - except the iwlwifi and Bluetooth hci0 and gkr-pam messages were still there.
journalctl:
- kernel: iwlwifi BIOS contains WGDS but no WRDS
- Bluetooth: hci0 Malformed MSFT vendor event 0x02
- lightdm[1490]: gkr-pam: unable to locate daemon control file
- systemd[1504]: Failed to start Sound Service
- kernel: task kworker/u16:1:12, NetworkManager, wpa_supplicant, s-daemon blocked for more than 122 seconds, not tainted
- systemd[1]: Failed to start Accounts Service, RealtimeKit Scheduling Policy Service, Portal service
- pulseaudio[2238]: GetManagedObjects() failed org.freedesktop.systemd1.ShuttingDown: Refusing activation, D-Bus is shutting down
dmesg: shows similar messages to journalctl, i didnāt spot any additional issues in red.
boot.log: Failed to start Virtual Console Setup
however i see this on my other Manjaro systems without issue. No other FAILED notices in boot.log, even those mentioned in journalctl like Network Manager and WPA supplicant.
inxi --full --admin --filter --width
runs and reports all the expected system config info in-line with the specs i shared earlier. I can dump this to a USB and bring to functional workstation to post if necessary.
Using Pamac to upgrade the a fresh 23.0.4 install to 23.1.2 2024-01-02 also broke system functionality in the same way. This upgraded the 6.5.5-1 kernel to 6.5.13-7 which I suspect is the reason rather than due to general package upgrades.
So I reinstalled 23.0.4 and itās bundled 6.5.5-1 (known working). I set the linux65
package to ignored
in Pamacās preferences so it would hold 6.5.5-1 and not upgrade to the current 6.5.13-7 which is a kernel that has previously triggered the issue. I upgraded all other Manjaro packages from 23.0.4 to 23.1.2 2014-01-02, including linux-firmware. This works fine! I was able to reboot into 6.5.5-1 and 5.10.205 on 23.1.2 and the system works perfectly. If i reboot into 6.6.8 it fails in the same way as before.
In summary, the system works fine seemingly regardless of Manjaro build, so long as on certain kernels - including the current kernel 5.10.205-1, as well as numerous historic kernels in the 5.x and 6.x branch over the past 12 months or so (problem only arose recently), at least including 6.5.5-1 from 23.0.4. I also know the kernel in 22.0.5 (6.1.1?) works fine when i boot from an old 22.0.5 image.
Meanwhile, the system consistently faults when booting most of the currently available kernels, including 6.7.0rc7-2, 6.6.8-2, 6.5.13-7, 6.1.69-1, 5.15.145-1 as well as more recent historic kernels including 6.6.7. This happens regardless of whether the base install for general Manjaro packages is at 23.0.4 or 23.1.2.
For what itās worth the installs are entirely vanilla, no customization going on, and I run Manjaro on several other PCs without issue. What do you think - have I potentially run into a compatibility issue, albeit perhaps with modern kernels? Are there any diagnostics I can attempt to run under any particular configuration and report back on that would shed more light?
I would like to discover the reason for the newer kernels not working, and work towards a solution for running a currently supported 6.x kernel successfully - given 6.5.x is EOL now. Various older 6.x kernels have worked, and Iād be uncomfortable relying on 5.10 to continue working given that more recent releases of 5.15 now do not and whatever change that has been introduced to the other kernels may yet be brought to the 5.10.x tree.