How damaged is my drive, is my data recoverable?

Hi,

I arrived at the office this morning and found my computer unable to mount the filesystem.

it displayed :

BusyBox v1.17.1 built-in shell (ash)
Enter 'help' for a list of built-in commands

(initramfs)

I tried exit and reboot with nothing happening.
I launched fsck and it was Stuck in fsck after “Force rewrite? yes”

I launched a live usb and ran smartctl -a /dev/sda
here is the report :

sudo smartctl -a /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-27-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000DM008-2FR102
Serial Number:    ZFL25SFX
LU WWN Device Id: 5 000c50 0c5896809
Firmware Version: 1002
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Feb  3 09:18:00 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 201) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x30a5)	SCT Status supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   045   045   006    Pre-fail  Always       -       96599561
  3 Spin_Up_Time            0x0003   099   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       160
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       3624
  7 Seek_Error_Rate         0x000f   084   060   045    Pre-fail  Always       -       263364121
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3840 (156 47 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       113
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   095   095   000    Old_age   Always       -       5
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       65537
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   057   049   040    Old_age   Always       -       43 (Min/Max 43/44)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       184
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1073
194 Temperature_Celsius     0x0022   043   051   000    Old_age   Always       -       43 (0 22 0 0 0)
195 Hardware_ECC_Recovered  0x001a   080   064   000    Old_age   Always       -       96599561
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       168
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       168
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       3771 (215 61 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2488475327
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       852148206

SMART Error Log Version: 1
ATA Error Count: 5
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 occurred at disk power-on lifetime: 3839 hours (159 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      00:01:47.755  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:01:47.741  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:01:47.715  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:01:47.713  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:01:47.700  SET FEATURES [Set transfer mode]

Error 4 occurred at disk power-on lifetime: 3839 hours (159 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      00:01:47.603  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:01:47.603  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:01:47.603  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:01:47.601  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:01:47.601  READ FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 3839 hours (159 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      00:01:47.151  READ FPDMA QUEUED
  60 00 10 ff ff ff 4f 00      00:01:47.151  READ FPDMA QUEUED
  60 00 18 ff ff ff 4f 00      00:01:47.150  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:01:47.149  READ FPDMA QUEUED
  60 00 58 ff ff ff 4f 00      00:01:47.148  READ FPDMA QUEUED

Error 2 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  13d+22:41:32.993  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  13d+22:41:32.993  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  13d+22:41:32.992  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  13d+22:41:32.992  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  13d+22:41:32.972  READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 2148 hours (89 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 a0 ff ff ff 4f 00  13d+22:41:32.756  READ FPDMA QUEUED
  60 00 70 ff ff ff 4f 00  13d+22:41:32.756  READ FPDMA QUEUED
  60 00 38 ff ff ff 4f 00  13d+22:41:32.748  READ FPDMA QUEUED
  60 00 50 ff ff ff 4f 00  13d+22:41:32.748  READ FPDMA QUEUED
  60 00 78 ff ff ff 4f 00  13d+22:41:32.731  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I mounted /dev/sda4 on a montpoint, I see all my root folders except my home (which is what I want to recover).

total 52428912
lrwxrwxrwx   1 root root           7 juil.  6  2021 bin -> usr/bin
drwxr-xr-x   4 root root        4096 févr.  3 05:20 boot
drwxrwxr-x   2 root root        4096 juil.  6  2021 cdrom
drwxr-xr-x   4 root root        4096 févr.  9  2021 dev
drwxr-xr-x 149 root root       12288 févr.  2 05:43 etc
lrwxrwxrwx   1 root root           7 juil.  6  2021 lib -> usr/lib
lrwxrwxrwx   1 root root           9 juil.  6  2021 lib32 -> usr/lib32
lrwxrwxrwx   1 root root           9 juil.  6  2021 lib64 -> usr/lib64
lrwxrwxrwx   1 root root          10 juil.  6  2021 libx32 -> usr/libx32
drwx------   2 root root       16384 juil.  6  2021 lost+found
drwxr-xr-x   3 root root        4096 juil.  9  2021 media
drwxr-xr-x   2 root root        4096 févr.  9  2021 mnt
drwxr-xr-x   3 root root        4096 janv. 17 08:13 opt
drwxr-xr-x   2 root root        4096 avril 15  2020 proc
drwx------   9 root root        4096 janv. 21 09:15 root
drwxr-xr-x  13 root root        4096 juil.  6  2021 run
lrwxrwxrwx   1 root root           8 juil.  6  2021 sbin -> usr/sbin
drwxr-xr-x  17 root root        4096 oct.  15 09:08 snap
drwxr-xr-x   2 root root        4096 févr.  9  2021 srv
-rw-------   1 root root 53687091200 août  23 07:22 swapfile
drwxr-xr-x   2 root root        4096 avril 15  2020 sys
drwxrwxrwt  25 root root       20480 févr.  3 05:20 tmp
drwxr-xr-x  15 root root        4096 nov.  19 07:50 usr
drwxr-xr-x  14 root root        4096 févr.  9  2021 var

How can I recover my home ?
Thank you !

Impossible question.

If you have data of real value - disconnect the device - don’t mess with it - use a data recovery expert.

Recoverable? From a statistical perspective: Probably almost everything(since it is hard to really delete data), but there is probably no way to answer for sure.

But the question is also how much the datarecovery will cost(Time &/ Money)?

But Data-Recovery is a field for experts (as already mentioned here).

So you are saying from the output of smartcl that my drive is definitively faulty (I had it for only 6 months) and my only solution then is to throw it away, (my last backup is 2 weeks ago and most thing are saved online, so it wouldnt be worth to hire an expert)

I’d say yes. I won’t trust the drive anymore.

Example: I’ve got a perfect 4TB drive here, that hasn’t got any problems. That I refuse to use, because its data cable gave problems. No, I don’t swim in money. My data is just very precious to me.

1 Like

Can you do a

sudo fdisk -l

You may have your home on a different partition then the root.

Sometimes fsck doesnt really stuck, it only dont show further progress, but its testing other parts. So give it more time.
4000h is really not something where a hdd fails. Usually, the rule is, either the first 2000 or never (much later).

The /home directory is always part of the / filesystem.
Some other partition may be then mounted to it, to hold it’s contents.
But the directory itself is always there.
It isn’t in this case …

Perhaps the contents of /etc/fstab could help

and/or trying to chroot and look at the logs for clues

I never tried it but it certainly seems possible to (recursively) delete the /home directory if you do it as root …

But he has got a backup from two weeks ago - so if no very important files are there since then …

Don’t use the device except to image it and then investigate using the image - as the device itself could be losing more data if it is defective - but the important data in /home are already … gone inaccessible

@Nachlese

And? Why do you tell me that. Trying to be smart?

I asked for fdisk, to see, if there is the home partition.
Of course, since he interrupted his fsck, it can be, that there are still errors.

Please dont answer again on my comments or Posts.
Thank you.

You may consider yourself to be safe from that kind of annoyance.

Yes, the filesystem on the disk is obviously not “clean” yet - as the fsck (with “force rewrite=yes” …) was stalled and then cancelled.

I was trying to present options to the OP - to perhaps help him,
not to offend you.

… sorry - but: not sorry
fare well