Black Screen after last update

winnie · 31 May 2022 16:34

To run a “short” test:

nvme device-self-test --self-test-code 1h /dev/nvme0

Replace 1h with 2h if you want to run a long test, or eh to run a “vendor specific” test.

You can also install badblocks and run a simple read-only test, which will report any I/O errors for read operations.

For example,

badblocks -b 512 -c 65535 -s -v -o badblocks_log.txt /dev/nvme0n1

Due to the nature of storage devices, even if the read-only test passes, it doesn’t rule out that corruption (of data) exists on the drive, and writes can still save corrupted data.

winnie · 31 May 2022 16:40

If it means anything, same exact file size, same modification timestamp, different MD5 hashes.

$ ls -l /usr/lib/libLLVM-13.so

root root 109489408 May 19 13:51 /usr/lib/libLLVM-13.so


$ md5sum /usr/lib/libLLVM-13.so

89d0da2617cce83c0bcd7b1e5cc83ae1  /usr/lib/libLLVM-13.so

I also downloaded the package llvm-libs-13.0.1-4-x86_64.pkg.tar.zst (from an up-to-date Manjaro repository), and it has the same MD5 hash for libLLVM-13.so as the one installed on my system.

If you’re on the stable repository and fully updated, your MD5 hash should match.

Since it doesn’t, this can mean:

Failing storage device (improper writes)
Failing RAM (since an archive needs to be extracted before written)
Failing CPU (not as likely)

Or…

During the update, there was some sort of power outage in which your system was left in a partially upgraded state. (Yet the file should not have been written in a corrupt form if everything listed above is in working order.)

Are you on the Stable Updates train?

linux-aarhus · 31 May 2022 16:46

No - it means

sudo grub-mkconfig -o /boot/grub/grub.cfg

Or the Manjaro convenience script

sudo update-grub

But as discovered - filesystem errors can produce the most unexpected hard to troubleshoot issues.

It appears a bit strange if the update would have caused filesystem errors - unless of course the sync was interupted by a power failure - that can cause some very weird issues as well.

nabzve · 31 May 2022 19:27

Hooray, it finally works!

I focused on /usr/lib/libLLVM-13.so different MD5 checksums. So I just did pacman --overwrite=libLLVM-13.so -S llvm-libs (from chroot). Then, I checked again the checksum with md5sum /usr/lib/libLLVM-13.so and it changed to 89d0da2617cce83c0bcd7b1e5cc83ae1, matching the one shared by @winnie.
exit, then systemctl reboot and it works like a charm!

I can’t thank you enough for your help, time, ideas, hints, tries, etc. I’m using Linux based distros since more than a decade, but I learnt a LOT from your expertise since yesterday. I’m really glad we find a solution thanks to this amazing team work. I’m so grateful!

I’m marking this message as a solution, but the whole thread should be read.

Here is my full post-mortem, I hope it can help others: in my very own case, libLLVM-13.so was faulty but much probably only because a disk writing issue or because something happened during an update. It could have been any other file actually.
libLLVM-13.so have been identified because it was mentioned in journalctl --boot=-1 --priority=3 --no-pager.
Another way to identified it was by switching to tty2 (Ctrl+Alt+F2), then trying to launch startx with no success. A mentioned log file were /home/username/.local/share/xorg/Xorg.0.log, where we can find:

[  5584.703] (EE) Backtrace:
[  5584.703] (EE) 0: /usr/lib/Xorg (xorg_backtrace+0x89) [0x5621290c70d9]
[  5584.703] (EE) 1: /usr/lib/Xorg (0x562128f77000+0x15aef9) [0x5621290d1ef9]
[  5584.704] (EE) 2: /usr/lib/libc.so.6 (0x7f1d84311000+0x3e8e0) [0x7f1d8434f8e0]
[  5584.704] (EE) 3: /usr/lib/libLLVM-13.so (0x7f1d7b22f000+0x411a56c) [0x7f1d7f34956c]
[  5584.704] (EE) 
[  5584.704] (EE) Segmentation fault at address 0x7f1d7f34956c
[  5584.704] (EE) 
Fatal server error:
[  5584.704] (EE) Caught signal 11 (Segmentation fault). Server aborting

As a result, it appeared libLLVM-13.so was corrupted. MD5 checksum confirmed it. Replacing the file fixed the issue.

I’ve to say other things have been fixed during investigations such as removing an EOL kernel, cleaning some remaining packages, fixing bad looking disk issues, upgrading to 5.17 kernel, cleaning odd files, etc. Nothing was enough to fix the black screen issue by its own, but a combination with the identified solution cannot be excluded (especially disk errors fix).

Also, acpi_backlight=vendor is still removed from /etc/default/grub. Since I didn’t applied changes with sudo update-grub I’m not sure if it’s live or not. Since it works, I won’t touch anything for now, but I might if needed later.

I didn’t move forward about checking NVMe disk. It might be a good idea but since everything works I’m reluctant to tempt fate…

For the record, this is current booting logs:

journalctl --boot=0 --priority=3 --no-pager

mai 31 20:12:38 matebook kernel: sd 0:0:0:0: [sda] No Caching mode page found
mai 31 20:12:38 matebook kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
mai 31 20:12:56 matebook kded5[1222]: org.kde.plasma.dataengine.geolocation: error:  "Unknown host location.services.mozilla.com: Host not found"
mai 31 20:12:56 matebook kded5[1222]: org.kde.plasma.dataengine.geolocation: error:  "Unknown host location.services.mozilla.com: Host not found"
mai 31 20:12:56 matebook akonadiserver[1508]: org.kde.pim.akonadiserver: "\nSql error: Table 'akonadi.schemaversiontable' doesn't exist in engine QMYSQL: Unable to execute query\nQuery: ALTER TABLE SchemaVersionTable ADD COLUMN version INTEGER NOT NULL DEFAULT 0"
mai 31 20:12:56 matebook akonadiserver[1508]: org.kde.pim.akonadiserver: Unable to initialize database.

Much shorter! There are still a few errors I should probably work on, but I’ll do later and in another thread.

One final question: Is it normal to have a lib32 llvm-libs installed?

pacman -Qqn | grep llvm-libs

lib32-llvm-libs
llvm-libs

Alright, that’s it! Once again, thank you so so much everyone!

winnie · 31 May 2022 19:45

I wouldn’t just check the NVMe drive. I would also run a memtest and mprime test.

If that was my system, I would not feel comfortable using it, knowing it already wrote data in a corrupted state.

Imagine if that happens to a photo, or video, or email, or document?

You really want to rule out a faulty drive, RAM, or CPU when you already know of data corruption.

Remember, bad RAM can write to disk corrupted images, videos, and documents, but you wouldn’t get any sort of “warning” since your system will run fine and not crash. It’s actually a good thing when a crucial system file is corrupted, because you’ll know there’s an issue with your system when it crashes and/or cannot boot up properly.

For all you know, the specific file libLLVM-13.so might only be a canary in a coalmine. We just so happened to discover it was corrupted.

Here is a plausible scenario. (There are many possibilities, but here’s just one of them):

Your RAM and/or CPU is recently faulty. You don’t notice this, because it hasn’t written a corrupt system file yet.

Then you update Manjaro, which essentially downloads and unpacks the packages.

Because the RAM and/or CPU is faulty, some packages extract their files into a corrupted state, which are written to the drive.

Now you try to reboot and are met with errors, because a system file is corrupt.

Now you know something is wrong…

And it wasn’t until you ran a system update were you alerted by this “canary in a coalmine” that your RAM or CPU could be faulting. (Even infrequent faults are unacceptable.)

(Another canary in the coalmine were the checksum errors in your Ext4 filesystem…)

nabzve · 31 May 2022 20:07

@winnie Got it, I will. You’re perfectly right.

I’m pretty confident it came because I closed the lid while the update was performing. The release was huge (more than 2gb). Last time I checked, download was still under 1gb. I had to go 10 min later, closed the lid (not thinking the download was possibly done -very poor connection here). When back, I wasn’t able to wake it up like it appears on a regular basis with matebook unfortunately (the issue have been identified but not solved as far as I know). I forced to shut down and then the black screen issue appeared from this very instant.
I checked pacman logs from tty2 (my first post) and it appeared to me the update was entirely done, but I think I was wrong. Closing the lid much probably corrupt the update.

That being said, you’re totally right and I’ll check the NVMe drive. It silly to not check. I will also run a memtest and mprime test.Thanks for your suggestions!

brahma · 31 May 2022 20:09

nice job of fixing it with that command of yours…
i also have the lib32 llvm-libs installed
and also just to be sure, you can check the checksum of that other library that was mentioned in the logs: md5sum /usr/lib/libc.so.6
mine is: e0e02bde032a3488ba098df28eec312a

nabzve · 31 May 2022 20:11

Thanks, @brahma!

md5sum /usr/lib/libc.so.6

e0e02bde032a3488ba098df28eec312a  /usr/lib/libc.so.6

It matches!

winnie · 31 May 2022 20:15

Hah! BOTH of your computers must be corrupted, because my libc.so.6 MD5 is d8f3f334e72c0e30032f2a1a1229aef1

Oh, wait…

…

…Oh! OH NOOOOOOOOO!!! * winnie has been disconnected *

system · 3 June 2022 10:15

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.