NVMe SSD freezes and I/O errors after dd clone and boot attempt

LeakyMemory · 26 February 2023 20:53

I’ve wasted more than 18h at this point on this, please excuse me if I don’t provide the completest of information here, ask if you got any ideas. I am fed up beyond belief but my geist stops me from giving up.

Given:

Machine: MSI Alpha 15 (2021 model with the AMD Ryzen 5800H/RX6600M)
- latest firmware: E158LAMS.108
Old NVMe SSD: Intel SSD 760p 256GB
New SSD: ADATA XPG Gammix S11 Pro 2TB
- Controller: SM2262 or SM2262EN)
- latest firmware
Manjaro Live USB (older linux515 release and):
- Linux manjaro 6.1.12-1-MANJARO

Due to the complexity of the installation I decided to dd from old to new drive. Then fixup the partitioning and extend the LVM etc. It has:

GPT scheme with 4 partitions
- p1: ESP aka /boot/, contains both bootloaders
- p2: Windows’ reserved partition (msftres)
- p3: Encrypted Windows partition (Veracrypt)
- p4: LUKS with Manjaro
  - LVM containing separate partitions for swap, root, home

As of writing the post I tracked it down to:

Boot from Live USB
The exact dd command I used: dd if=/dev/nvme1n1 of=/dev/nvme0n1 bs=8M status=progress oflag=direct. Takes 7 minutes, you can read the contents without errors. Two notes to make:

a. The partition UUIDs are gonna be duplicated, this is intended because I will erase the old drive. As a precaution, I remove the old drive after shutting down.
b. The tail GPT data is null. I did not use (g)parted to fix up.
Launch Gparted and allow it to fix the GPT data (it asks to extend the partition table to the whole disk)
Mount migrated /boot/ to lookup bootloader path and change grub settings
Mount migrated root / to comment out entries in fstab (HDDs not connected to laptop)
Add Manjaro’s bootloader on new SSD manually to UEFI with efibootmgr, because autodetection seems broken on Manjaro’s end.
Dismount new /boot/ and /
shutdown now and physically remove the old SSD
Boot into firmware, confirm my new Manjaro entry is there. The boot takes a few more seconds although the power was not cut.
Reboot into Live USB again (not touching the cloned disk)
Read to verify: dd if=/dev/nvme0n1 of=/dev/null bs=64K iflag=direct status=progress Returns:

dd: error reading ‘dev/nvme0n1’: No data available
6188+0 records in
6188+0 records out
405536768 bytes (406 MB, 387 MiB) copied, 0.28443 s, 1.4 GB/s

This location is part of the boot partition.
Check dmesg:

nvme0n1: Read(0x2) @ LBA 792064, 128 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) MORE
critical medium error, dev nvme0n1, sector 792064 op 0x0:(READ) flags 0x800 phys_seg 8 prio class 2

Here’s another one from an earlier attempt, the fs superblock is “dead”:

blk_update_request: critical medium error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

Essentially soft-bricked. I can write to the SSD all I want (wasted an hour and 2TB of TBW on badblocks before), but as soon as it tries to read: error. In fact, badblocks tried so hard it froze the controller and caused the kernel to drop the NVMe device. When I reproduced and tried to write manually with dd, it did nothing to alleviate the errors. A dd read now errored at a different location.

The SSD is unusable in this state. However if I do format/securely erase it with nvme format -s1 /dev/nvme0n1 - well then it begins to function normally again.

I’ve tried fresh installs of Fedora, Ubuntu, Debian, Manjaro, Manjaro with encryption (via Calamares). They work. However cloning my previous setup here does not work. WTF?

(1) How on earth does the controller soft-brick itself?
(2) What on earth is causing it? I need your help:

dd is not at fault: immediately after cloning I can read the entire drive without errors
GPT tail mirror not copied properly. Gparted immediately suggests to fix that. Tried.
Cloned grub does not matter, on my last attempt I didn’t even boot it
According to nvme-cli and smartctl, both SSDs are set to 512 LBA size. So everything must be in the same locations as on old SSD
Broken SSD firmware settings where deep-power states are being misreported. I tried nvme_core.default_ps_max_latency_us=0 as a linux kernel parameter, no change. If yes, why would clean installs & subsequent dd reads work fine?
My last attempt hints at the UEFI Firmware being the culprit. I did not boot into the cloned system, only UEFI settings and then my Live USB again.
- Maybe the SSD’s firmware after all? Though I think in this case that’d have become a widespread problem

I don’t have another machine with two M.2 slots to test.

jmagder · 26 February 2023 23:23

I think the SSD is toast and I would not trust my data with it! Just because it works on a new format doesn’t mean you won’t still get read errors eventually. It’s entirely possible a certain number of blocks need to be filled before you see the error, based on randomness of where they are logically mapped to. What is the output of sudo smartctl --all /dev/nvme0n1? Those logs are generated from the ssd controller.

TriMoon · 27 February 2023 06:49

Simple answer to your problem:
DON’T use dd to duplicate an OS.

NVMe SSD’s are not HD’s, they are RAM chips that emulate an HD…
So you can’t make a sector-by-sector copy because it doesn’t have any actually…

So if you want to transfer your OS, you should manually re-create the needed partitions, then use rsync to copy-over the contents of the partitions.

LeakyMemory · 27 February 2023 10:20

I’m fairly certain all of these logged errors come from after the breaking boot attempt, when I tried to read back from the SSD. I confirmed that my later reads increased the counter. Either way I can initiate an RMA with the shop or ADATA (what a company btw, their contact forms don’t work at all for me. "Please try again later"™).

smartctl --all

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.12-1-MANJARO] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       XPG GAMMIX S11 Pro
Serial Number:                      @REDACTED@
Firmware Version:                   32B3T8EA
PCI Vendor/Subsystem ID:            0x1cc1
IEEE OUI Identifier:                0x000000
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Utilization:            292,644,745,216 [292 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Feb 27 09:58:20 2023 UTC
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        22 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    5,058,335 [2.58 TB]
Data Units Written:                 6,414,344 [3.28 TB]
Host Read Commands:                 157,200,598
Host Write Commands:                27,019,331
Controller Busy Time:               152
Power Cycles:                       42
Power On Hours:                     12
Unsafe Shutdowns:                   26
Media and Data Integrity Errors:    13,183
Error Information Log Entries:      13,183
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   255
Thermal Temp. 1 Total Time:         1098

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

@TriMoon /dev/nvmeXnY is a regular block device interface, the same as a hard drive. If you don’t believe me, open Gparted

TriMoon · 27 February 2023 12:04

I’m using an M.2 NVMe SSD also

Drives:
  Local Storage: total: 5.46 TiB used: 968.33 GiB (17.3%)
  ID-1: /dev/nvme0n1 vendor: Addlink model: M.2 PCIE G4x4 NVMe
    size: 3.64 TiB
  ID-2: /dev/sda vendor: Western Digital model: WD20EARX-00PASB0
    size: 1.82 TiB

anon58500165 · 27 February 2023 14:15

The only thing that catches my eyes is bs= value?
You used 8M to make the image but when you verify you use 64k?

You maybe better of using Clonezilla or Rescuezilla though

GaVenga · 27 February 2023 15:20

Some Mobos do not allow 2 NVMe at the same time as drives ==> board-manual
Second slot: WLAN-module only or other restrictions…

dragan · 27 February 2023 17:25

dd could be tricky, especially for what you’re doing here (personally, I would never rely on it for cloning, I’d rather manually partition and custom install no matter what),
reads could truncate (I don’t remember the full story, you can find/read about it),

short story, how I’d do it,

I’m always using iflag=fullblock (w/o oflag) especially for bs=8M,
avoid letting GParted fix the drive for you (that tells me your dd didn’t go quite right).

As for the drive health,
if you can format and install things on it I’d just load-test it,
it might be fully ok still (and there’s good chance it is - but I trust only Samsung honestly),
I’ve had USB-s (different but) that behaved like yours but were fully resoverable and still working fine.
You can use proper tools to load test, or personally I’d just download bitcoin full-node : ) (if anything wrong with the drive that will fail).

markmarques · 27 February 2023 17:38

Hi,
as small advice I concur with @dragan as the Gparted warning is the red flag that something with dd did not worked as expected…

Nonetheless a possible scenario would be to used Clonezilaa to duplicate the system …

Dumb question : did you use dd with the system running or from an outside “live” system ?

jmagder · 27 February 2023 21:42

This part right here:

Those are hardware problems. You should RMA this drive. You could boot from a fresh liveUSB and run smartctl again and you will still see media and integrity errors.

jmagder · 27 February 2023 21:48

They aren’t RAM, or they would lose their data in a power failure. Some will have RAM to buffer data before its written. I wouldn’t want to dd a drive myself, but that is exactly what we do with USB drives when we burn ISO images. The drive mapping behind the scenes is irrelevant to reading back that data. There is an underlying problem in the hardware, look at all those media and data integrity errors output from smartctl.

TriMoon · 28 February 2023 06:38

I used RAM as a metaphore, to indicate the difference between a magnetic plate and electronic circuits to store the actual data, because i didnt want to go too deep in technical terms for that…

But if you guys insist on them behaving the same, go ahead and see threads like this for the results
The biggest difference is that a normal HD doesn’t relocate sectors behind the scenes while writing data, which These SSD drives will do and thus cause errors in the partitions written using static data from dd…

GaVenga · 28 February 2023 10:38

Better tell this: “Static RAM”
Inside my last Amiga 3000 I had a predecessor to an NVMe,
based on “Static RAM” and an EEPROM as a boot-device sitting in one of the expansion-slots…

TriMoon · 28 February 2023 10:48

More to the point:

RAM = Random Access Memory
This includes DIMM’s, NVMe, Video Memory etc etc
(ie. it is a general term used to describe electronic memory, and not just for the memory used to run programs as most think…)

LeakyMemory · 28 February 2023 16:41

Like I said, all of these counted errors likely only got triggered after cloning. Weirdly, there’re no error log lines in SMART to offer an explanation. I strongly suspect a host firmware or controller firmware bug/issue. At this point. Nonetheless, you reinforced my belief to RMA the drive, I don’t want any of the controller bugs down the line even if I managed it somehow today (i.e. fresh install).

Tried with clonezilla. It uses a (custom) dd under the hood as evidenced by glancing at htop. The result is the same after rebooting. Further it doesn’t fix up the GPT automatically, Gparted too complained about improper GPT size. If you were to ask me, I think it’s not properly maintained because there’re spews of “egrep is deprecated” messages to stderr.

I used 8M when cloning for speed, bs= in dd is basically the buffer size. 64K comes from later tests, based on observations the few error messages from the controller hinted at a 64K internal block size, the LBA locations were 64K aligned. So I decided to “read-test” with bs=64K

The laptop has two dedicated NVMe slots for SSDs.

iirc they would only truncate in case of read errors. In other words, not write any blocks in place of failed reads. The source drive is in pristine condition, there are no error messages in dmesg during cloning.

Hm seems to be irrelevant here: dd invocation (GNU Coreutils 9.5)

Onto the Gparted warning: I think it only concerned that the partition table did not encompass the whole of the new 2TB space. I don’t know if the “fix” button merely extends the GPT or also correctly writes the backup GPT to the end of the drive. I will try one last time cloning and fixing the backup GPT header manually: https://askubuntu.com/questions/386752/fixing-corrupt-backup-gpt-table Again, clonezilla does not handle this either.

A sanity question! Cloning is done from the live system, both SSDs are “offline”.

Badblocks from Live without rebooting:

badblocks -t 3979117713 -b 1048576 -c 4 -e 5 -s -v -w /dev/nvme0n1
Checking for bad blocks in read-write mode
From block 0 to 1953513
Testing with pattern 0xed2c8491: done
Reading and comparing: done
Pass completed, 0 bad blocks found. (0/0/0 errors)

No errors… That’s why I think the firmware is playing the Joker role very well.

I have tried to clone → shutdown laptop → put the cloned SSD in my computer, where it belongs. It also boots up into I/O errors. Both computers run AMI UEFI, so if there’s an implementation bug then probably the same one.

Despite spending more time on this anomaly, I will try:

badblocks with a random pattern in case the controller compressed the static pattern I specified and only wrote 1 block.
Clone and manually extend the partition table & write the backup GPT header to where it belongs, double-checking manually.
(maybe not) Load testing as suggested, while I find the suggestion of a full Bitcoin node entertaining I think it it would take ages to sync from the 1st block. I will make use of another decentralized protocol that I use to receive Linux ISOs on my local machine
Submit an RMA case to the shop.
The contact forms on ADATA’s website did not work for me (Please try again later), their input forms are a mix-and-match: one says 1MB upload limit for receipt, another 2MB. Their Twitter account has had 2 days to reply. Personally I will avoid that brand (and their XPG) for anything in the future. This was the confirmation I needed and the first and last chance to gain trust for a manufacturer who, like Kingston, “likes” to silently swap out hardware in their SSDs.

Thanks for the replies so far.

TriMoon · 28 February 2023 17:25

Don’t do that, just create a fresh GPT table with partitions you need, and copy-over the files inside them using rsync…
Don’t try to be lazy, because you end up doing more instead of less…

LeakyMemory · 28 February 2023 17:55

@TriMoon to avoid repeating, TLDR of my stance: a) a block device should just work b) I don’t fancy using a controller that will go haywire after an unspecified order of events. It’s going to be my only main drive, not a spare Steam drive I got for free.

badblocks with a random pattern caused the NVMe controller to drop out. It most likely has transparent compression, because a static pattern ran fine just before.

dmesg:

[16185.730972] nvme nvme0: I/O 576 (Read) QID 12 timeout, aborting
[16185.730997] nvme nvme0: I/O 587 (Read) QID 12 timeout, aborting
[16185.731005] nvme nvme0: I/O 588 (Read) QID 12 timeout, aborting
[16185.731013] nvme nvme0: I/O 589 (Read) QID 12 timeout, aborting
[16216.451211] nvme nvme0: I/O 576 QID 12 timeout, reset controller
[16247.170957] nvme nvme0: I/O 8 QID 0 timeout, reset controller
[16398.377598] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[16398.394885] nvme nvme0: Abort status: 0x371
[16398.394900] nvme nvme0: Abort status: 0x371
[16398.394910] nvme nvme0: Abort status: 0x371
[16398.394917] nvme nvme0: Abort status: 0x371
-
[16518.925782] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[16518.925792] nvme nvme0: Removing after probe failure status: -19
[16639.444471] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[16639.444752] nvme0n1: detected capacity change from 4000797360 to 0

badblocks, error limit = 5

root# badblocks -t random -b 1048576 -c 4 -e 5 -s -v -w /dev/nvme0n1

Checking for bad blocks in read-write mode
From block 0 to 1953513
Testing with random pattern: done                                                 
Reading and comparing: badblocks: Invalid argument during seekors)
171408
badblocks: Invalid argument during seek
171409
badblocks: Invalid argument during seek
171410
badblocks: Invalid argument during seek
171411
badblocks: Invalid argument during seek
171412
Too many bad blocks, aborting test
done                                                 
Pass completed, 5 bad blocks found. (5/0/0 errors)

TLDR: I think some flash on it is really dead. Submitting for RMA without second thoughts.
(imagine Bart Simpson writing on the blackboard) I will always test new storage devices before use
I will always test new storage devices before use
I will always test new storage devices before use

TriMoon · 28 February 2023 18:06

Ofcourse you need to RMA it…
But what i said still stands

LeakyMemory · 30 March 2023 21:21

Thanks everybody for the replies. The old SSD was DOA (dead on arrival), returned and got a 100% refund, bought a different SSD. It works flawlessly. I was too stubborn to believe that the SSD was bad… it was my second NVMe SSD ever too

As far as data migration is concerned, my general plan worked out perfectly.:

Clone data from old to new SSD with dd
Disconnect old SSD, reboot into Live USB and do the partition stuff:
Expand/fix the GPT table with GParted
Expand the LUKS partition (could do this step within GParted and unlocked LUKS)
- cryptsetup resize ... (it will fill the partition to 100% if no size specified)
Expand LVM’s Physical Volume that’s inside (it will fill the available space to 100% if no size specified)
Expand the Logical Volumes for root, /home with --resizefs (-L +123G)
Expand the LV of /swap
Update the /swap with mkswap for it to use the new expanded size
Add/replace the new swap partition UUID in /etc/fstab

Notes:

Reminder: part1 = EFI boot, p2=MS Data, p3=Windows, p4=LUKS(LVM(swap,root,home))
The previous drive was set to LBA size of 4KiB, new one uses 512 Bytes. This was not an issue.
A more accurate approach to cloning is to recreate the partition table and partitions on the new disk and to only clone the partition data with dd, not the entire block device:
- instead of dd if=/dev/nvme0n1 of=/dev/nvme1n1
- per partition: dd if=/dev/nvme0n1p1 of=/dev/nvme1n1p1 - this will you to expand any partition you want on the new disk, just gotta update the filesystem once cloned over to the bigger partition

system · 2 April 2023 11:21

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.