Complete crash at boot

gustavo · 6 September 2020 11:31

I experienced sudden freezes at the desktop, with low CPU usage. I had that problem before and it related to timeshift snapshots being corrupted/incomplete, and got fixed by doing timeshift --check

But this time the freezes were so bad that they happened while timeshift --check was running. I had to turn-off on the power button, and at the following boot I ended up at a blank terminal screen saying:

mount: /new-root: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error.
You arenow being dropped into an emergency shell
sh: can't access tty; job control turned off
[rootfs ]#

/dev/sda2 is the timeshift mount
/dev/sda1 is the root mount

I was using kernel 5.4.60 when the freeze happened. I managed to switch to 5.8 (latest version from last week) and rebooted once, but it did not help with the freezes.

I can’t navigate my way out of a failed boot. Would like help about how to load an old (hopefully uncorrupted) timeshift snapshot.

thanks a lot!

visone · 6 September 2020 11:35

Hi!
run a fsck /dev/sda2 from a live iso

gustavo · 6 September 2020 11:37

I’ll try tha. Thanks. The shell I was dropped at is so bad I can’t even run sudo on it.

gustavo · 6 September 2020 12:14

Running btrfsck here from the live distro…

I ran

btrfsck --force /dev/sda2

got the following:

[manjaro@manjaro etc]$ sudo btrfsck /dev/sda2 --force
Opening filesystem to check…
Checking filesystem on /dev/sda2
UUID: b78f7e34-c6bc-4629-9656-428263adba29
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 257 inode 731163 errors 100, file extent discount
Found file extent holes:
start: 524288, len: 1433600
root 257 inode 731165 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4096
root 257 inode 731166 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4096
ERROR: errors found in fs roots
found 70079352832 bytes used, error(s) found
total csum bytes: 39334640
total tree bytes: 1410449408
total fs tree bytes: 1192951808
total extent tree bytes: 164790272
btree space waste bytes: 283567410
file data blocks allocated: 2019458793472
referenced 129898369024

Then I tried:

sudo btrfsck --force --repair /dev/sda2

And got:

[manjaro@manjaro etc]$ sudo btrfsck /dev/sda2 --repair --force
enabling repair mode
Opening filesystem to check…
Checking filesystem on /dev/sda2
UUID: b78f7e34-c6bc-4629-9656-428263adba29
repair mode will force to clear out log tree, are you sure? [y/N]: y
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
No device size related problem found
[3/7] checking free space cache
cache and super generation don’t match, space cache will be invalidated
[4/7] checking fs roots
root 257 inode 731163 errors 100, file extent discount
Found file extent holes:
start: 524288, len: 1433600
Fixed discount file extents for inode: 731165 in root: 257
root 257 inode 731165 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4096
Fixed discount file extents for inode: 731166 in root: 257
root 257 inode 731166 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4096
ERROR: errors found in fs roots
found 70079352832 bytes used, error(s) found
total csum bytes: 39334640
total tree bytes: 1410449408
total fs tree bytes: 1192951808
total extent tree bytes: 164790272
btree space waste bytes: 283567410
file data blocks allocated: 2019458793472
referenced 129898369024

Then, just to be sure the problems were repaired, I ran the first command again, to see if this time it would come out clean. But got the same errors again:

[manjaro@manjaro etc]$ sudo btrfsck /dev/sda2 --force
Opening filesystem to check…
Checking filesystem on /dev/sda2
UUID: b78f7e34-c6bc-4629-9656-428263adba29
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
cache and super generation don’t match, space cache will be invalidated
[4/7] checking fs roots
root 257 inode 731163 errors 100, file extent discount
Found file extent holes:
start: 524288, len: 1433600
ERROR: errors found in fs roots
found 70079352832 bytes used, error(s) found
total csum bytes: 39334640
total tree bytes: 1410449408
total fs tree bytes: 1192951808
total extent tree bytes: 164790272
btree space waste bytes: 283567254
file data blocks allocated: 2019458793472
referenced 129898369024

Looks like the --repair option is not really fixing anything…

eugen-b · 6 September 2020 13:13

I was able to fix similar issue two out of five times. It takes the option --init-extent-tree and can take several days.

gustavo · 6 September 2020 13:21

I ran btrfsck with the --repair option twice. And then without the repair option, to see if it was OK then. Both times it came out with errors. I then tried to boot up and to my pleasant surprise, it worked. I don’t know if am left with a dirty file system, but at least it booted to desktop.

I now ran “timeshift --check” and it found my snapshots so corrupted that it wiped out all of them.

My desktop customization options such as themes and wallpapers were also reverted to stock KDE Plasma. I don’t know why that happened, but I am happy at least the system booted.

Thanks a lot for the tip!

eugen-b · 6 September 2020 13:24

You are lucky! You should now back up the personal data (real rsync/copy, not snapshot) and be prepared to reinstall next time it happens.

gustavo · 6 September 2020 13:25

On a related question… Do you personally find btrfs reliable, or prefer to stay out of it? I’m a newbie and find opinions so polarized about it. But the snapshot feature is too tempting and useful to miss.

eugen-b · 6 September 2020 13:40

I was a quite happy btrfs user. I used to work with it directly with btrfs commands. I used to create read-only snapshots and send them to a different disk. Timeshift still lack this feature. And Snapper is a bit too complex.

freggel.doe · 7 September 2020 10:48

Just FYI: that was an emergency shell:

no need for sudo as the user is root
cannot load sudo due to sudo being a binary which resides on your root fs - which isn’t mounted

gustavo · 12 September 2020 12:59

Thanks for clarifying that. Luckily the system booted back following a btrfsck --repair from live distro, as per Eugen’s suggestion. If I had to crawl my way up from that emergency shell I would be lost.

gustavo · 12 September 2020 13:05

I found it interesting that you said you were… Meaning not anymore… Maybe you lost faith in it? I wonder if you could share your perspective on the reliability of that filesystem. Some people avoid btrfs like the plague, but at the same time some major distros are defaulting to it (SUSE and next Fedora coming to mind). I find the snapshot feature too useful to turn my back to, but my personal experience with the random freezes on a very lightly used laptop is raising some yellow flags here.

eugen-b · 15 September 2020 00:32

It just doesn’t fit to my current life situation to deal with filesystem crashes, otherwise I like btrfs very much. Maybe in half a year when I’m fed up with f2fs I’ll come back to btrfs.

gustavo · 15 September 2020 01:18

I’m losing my faith in BTRFS. It seems so wonderful on paper but my personal experience is proving abysmal. I’m running it on a very lightly used notebook, only 30GB of data on disk. Hourly snapshots brought the notebook to a crawl after about 30 of them were in place. That’s a pretty decent 10th gen core i7 with an SSD, hardware isn’t a bottleneck. And then file corruption. In years I never had a problem with dozens of terabytes of NTFS and ext4 storage… BTRFS managed to corrupt a 30GB root installation all by itself.

The snapshot function is so powerful and useful I could not resist trying, but I will give up. I’ve been running btrfs rescue --init-extent-tree for over 24h non-stop now. It seems to be find millions of errors and is completely trashing my SSD and making my laptop CPU heat like a frying pan. I’ll think I’ll just stop it, salvage my home folder, format, revert to ext4, and be done with it.

What puzzles me most is that BTRFS has been adopted as the default file system for such stalwarts as Synology. It runs mission critical systems on Facebook. I don’t understand how it could have failed so miserably on a mere laptop (brand new!) that is sitting idle most of the time.

As a final remark, I installed BTRFS on a elementary OS virtual machine and guess what? 2 days after install that VM started to freeze exactly the same way…