Broken file-system after forced shutdown

Should be done by a Professional only - last chance ever… :sob:

Ok peeps, the file system got corrupted again :sob:

I fixed it again with Gnome Disk, lost some files again. I’m concerned about this though. Will it happen again? How could I prevent this? What could be the cause of this issue?

I’m thinking about reinstalling everything from scratch, now without the “Encryped Disk” option. Is there any reason to believe this could work (as in “the file system will stop getting randomly corrupted”)?

I’ll share some thoughts on this in the hopes that further details could help you understand what’s going on:

  1. I don’t think this is a hardware issue, since the file system was working fine before I tried booting into Manjaro for the first time.
  2. Could this be caused by me typing the wrong password to decrypt the disk? I imagine this can’t be case, since hopefully a wrong password wouldn’t be able to decrypt anything at all, but you never know.
  3. When I decrypted the drive to fix the filesystem in Gnome Disk I noticed there seemed to be two partitions in there (or atleast there were two boxes I could click on). One of them was named “File System (LUKS something something)”. I can’t remember the name of the other one, but it was somewhat generic. Is this something I should expect?
  4. When I tried checking the FS with Gnome Disk I got an error that said something like “Couldn’t check the File System, error with e2fsck”

Any help is appreciated.

I installed GSmartControl and it looks like there are some errors regarding the drive:

  1. Reallocated Sector Count isn’t zero
  2. Reallocation Event Count isn’t zero
  3. No SMART warning yet
  4. The drive is reporting surface errors
  5. This:
Complete error log:

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 1
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 5821 hours (242 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 71 00 04 00 00 00 80 87 80 e0 00

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  ea 00 00 00 00 00 00 00 00 00 00 a0 00     03:50:03.701  FLUSH CACHE EXT
  61 00 00 00 10 00 00 3a 2f b8 50 40 00     03:50:03.700  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 3a 2f b8 48 40 00     03:50:03.700  WRITE FPDMA QUEUED
  61 00 00 00 18 00 00 3a 2f b8 30 40 00     03:50:03.700  WRITE FPDMA QUEUED
  61 00 00 00 28 00 00 3a 2f b8 08 40 00     03:50:03.700  WRITE FPDMA QUEUED

Is this indicative of a hardware problem or could this be caused by the drivers I use (the proprietary drivers from the Majaro ISO)?

Use GSmartControl to perform self tests on the disk. Start with a short one, then do an extended one but this will be longer so be patient. What does that report?

1 Like

The short test completed with no errors, I’m running the extended one now. I should take about 2 hours apparently. Also, here’s the general information on the drive: smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.49-1-MANJARO] (local build)Co - Pastebin.com

The errors GSmartControl reported said something about “newer drivers not reallocating that much”, so I image the drive model is somewhat relevant.

To me, but I’m not a SMART specialist, looks not that good.

SMART:

1 Raw_Read_Error_Rate POSR-- 058 050 034 - 50333542
5 Reallocated_Sector_Ct PO–CK 098 098 036 - 168
7 Seek_Error_Rate POSR-- 080 060 045 - 105881775
196 Reallocated_Event_Count -O–CK 098 098 000 - 168

Error log:

Error 1 [0] occurred at disk power-on lifetime: 5821 hours (242 days + 13 hours)

which is roughly 110 hours of use ago (current hours of use seem to be at 5936). So not that long ago the disk logged an error (it is like 5 days without being shutdown, so do the maths regarding how long it is powered on daily to find when it happened recently).

Device Statistics:

168 — Number of Reallocated Logical Sectors
72 — Read Recovery Attempts
0 — Number of Realloc. Candidate Logical Sectors
691 — Number of Reported Uncorrectable Errors

Maybe someone with experience with SMART can analyze the data better than me. But my impression is that your disk starts to show some issues.

//EDIT: for comparison, my HDD (Western Digital Caviar Black) with 75000 hours of use (yes not a typo), has no issue at all regarding these data. 0 Read error, 0 Seek error, 0 reallocated sectors, nothing that I can see beside a few hundred UDMA CRC errors (it is an old HDD I use since multiple systems, which have been used with various old cables so there might have been a few transfer errors at some point).

By my calculations, 110 hours of use is about 8 days ago, which roughly coincides with the date I installed Manjaro (7 days ago). Actually, “the date I tried to install Manjaro, failed, the file system issue started, installed Ubuntu, then installed Manjaro” (yes, it’s very messy).

So I guess this has something to do with my installation (?), though it doesn’t look like it’s driver-related since I started experiencing the corruption in Elementary OS (I didn’t have no issues with the drivers or the FS beforehand).

For me it is a hardware issue. I can’t comment more on that you’ll need input from others at this point.

Ok, the extended test is done, and apparently there were no errors found (?), or at least that’s what the GUI reports (“Tests result: completed without errors”)

I’m quite surprised about this… Here’s the test output of the test: smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.49-1-MANJARO] (local build)Co - Pastebin.com.

Could this be indicative of a software issue instead of a hardware one?

@Pablo

That means: your drive is junk (= gets bricked soon)…
omano is right, full aknollage.

(You had a headcrash) :innocent:
(Alternative: some rows of cells in a SSD are damaged - no chance to repair)

2 Likes

Drives/Storage are inexpensive. Buy a new drive.

Thanks for in insight. I’m going to replace the drive then. I still find it extremely weird that the drive started failing after I installed Manjaro, but there’s enough proof this is a hardware issue for me (not a software issue).

PS: As an update, the system is now extremely unstable, the corruption is becoming increasingly more frequent and booting into the USB stick just to fix the FS is now pretty much required every time I turn the machine on. In fact, the only reason why I still bother to boot into the hard drive is because I can’t get KeePassXC installed in the USB stick and I have my passwords stored in there.

@GaVenga @omano and others: thank you very much for your help. I guess my final question is: is there any way the Manjaro ISO could have damaged the drive? I’m sorry for insisting on the question, but I’m still confused about the fact that the issue started right after I tried to install Manjaro for the first time. I guess it could very well be a coincidence, but I’m still curious about this.

To be frank, I’m not even sure if the issue started precisely after the install attempt. My estimate is that it started 8 days ago (the day I installed Manjaro), but it’s a rought estimate and I wouldn’t be surprised if the issue actually started a couple of days before that and went unnoticed. Also, as mentioned in previous comments the installation process I went through was extremely messy, so I wouldn’t be surprised if I was at fought here.

I just don’t get how trying to install a distro could possibly provoke any damage to the drive. Again, thank you very much for everyone in here. I’ll provide an update after I get the disk replaced.

I don’t think software could physically damage your drive.

PS: we can see in the last report that the reallocated sector count is increasing, as well as the uncorrectable errors, seems like your drive is getting worse.

2 Likes

From my perspective, the drive is clearly getting worst at every minute. For example, after running the “fix FS” routine from Gnome Disk for the first time it took about a week before the FS got corrupted again (yesterday) and I had to run it again today and the machine barely booted.

I’ll provide further updates when I get the drive replaced. For now, I’ll pull this machine out of it’s misery (shut it down to stop it from further damaging the drive) so that I can at least show what’s going on the technical assistant peeps what’s going on.

Wenn du merkst, das du ein totes Pferd reitest, dann steige ab…
If you find yourself riding a dead horse, dismount …

1 Like

Mount the harddrive into an external case - and try a Win - Program for data-recovery
but one witch doesnot write to the defective drive?! (important!)

Mount the harddrive into an external case - and try a Win - Program for data-recovery
but one witch doesnot write to the defective drive?! (important!)

I don’t think I need this. I do have the passwords backuped, I just can’t install KeepassXC in to pen drive so that I can open them. But I do have an old laptop (from around 2008, what a relic) lying around and I’ve just retrofitted it with Manjaro for temporary use. I can access the passwords via this older machine.

UPDATE: The new drive should arrive in about a week.

1 Like