[Help] Has my 2nd SSD died?

After a recent reboot I couldn’t reach a graphical login for Gnome, instead, reaching the emergency systemd-sulogin-shell. I noticed that and fsck job for my second NVMe SSD was timing out after 1 min 30 sec each time I rebooted, so I commented out the relevant line in /etc/fstab and was then able to reach the Gnome login prompt again.

However, now Manjaro can’t see my second internal drive at all. It’s like it doesn’t exist. The output of fdisk -l is:

Disk /dev/nvme0n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: Samsung SSD 970 PRO 1TB                 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 688C0084-D758-420A-87C2-F14073E8C5A1

Device          Start        End    Sectors   Size Type
/dev/nvme0n1p1   4096     618495     614400   300M EFI System
/dev/nvme0n1p2 618496 2000409209 1999790714 953.6G Linux filesystem

Is my 2nd SSD dead?

Output of inxi -Fazy below:

System:
  Kernel: 5.9.10-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.9-x86_64 
  root=UUID=f5245ffc-4c91-4b47-b905-1d35123f8237 rw quiet 
  rd.udev.log_priority=3 mitigations=off 
  Console: tty 0 wm: gnome-shell DM: GDM 3.38.1 Distro: Manjaro Linux 
Machine:
  Type: Desktop System: Intel Client Systems product: NUC8i7HVK v: J71485-504 
  serial: <filter> Chassis: Intel Corporation type: 3 v: 2.0 serial: N/A 
  Mobo: Intel model: NUC8i7HVB v: J68196-504 serial: <filter> UEFI: Intel 
  v: HNKBLi70.86A.0063.2020.0827.1309 date: 08/27/2020 
Battery:
  Device-1: hidpp_battery_0 model: Logitech MX Keys Wireless Keyboard 
  serial: <filter> charge: 55% (should be ignored) rechargeable: yes 
  status: Discharging 
CPU:
  Info: Quad Core model: Intel Core i7-8809G socket: BGA1440 (U3E1) 
  note: check bits: 64 type: MT MCP arch: Kaby Lake family: 6 
  model-id: 9E (158) stepping: 9 microcode: DE L2 cache: 8192 KiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx 
  bogomips: 49618 
  Speed: 800 MHz min/max: 800/8300 MHz base/boost: 3100/4100 volts: 1.0 V 
  ext-clock: 100 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 
  6: 800 7: 800 8: 801 
  Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled 
  Type: l1tf mitigation: PTE Inversion; VMX: vulnerable 
  Type: mds status: Vulnerable; SMT vulnerable 
  Type: meltdown status: Vulnerable 
  Type: spec_store_bypass status: Vulnerable 
  Type: spectre_v1 status: Vulnerable: __user pointer sanitization and 
  usercopy barriers only; no swapgs barriers 
  Type: spectre_v2 status: Vulnerable, IBPB: disabled, STIBP: disabled 
  Type: srbds status: Vulnerable 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: Intel HD Graphics 630 driver: i915 v: kernel bus ID: 00:02.0 
  chip ID: 8086:591b 
  Device-2: AMD Polaris 22 XT [Radeon RX Vega M GH] vendor: Intel 
  driver: amdgpu v: kernel bus ID: 01:00.0 chip ID: 1002:694c 
  Display: server: X.Org 1.20.9 compositor: gnome-shell 
  driver: amdgpu,ati,intel unloaded: modesetting,radeon alternate: fbdev,vesa 
  display ID: :0 screens: 1 
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x286mm (40.0x11.3") 
  s-diag: 1055mm (41.6") 
  Monitor-1: XWAYLAND0 res: 1920x1080 hz: 50 dpi: 92 
  size: 530x300mm (20.9x11.8") diag: 609mm (24") 
  Monitor-2: XWAYLAND1 res: 1920x1080 hz: 60 dpi: 92 
  size: 530x300mm (20.9x11.8") diag: 609mm (24") 
  OpenGL: renderer: AMD VEGAM (DRM 3.39.0 5.9.10-1-MANJARO LLVM 11.0.0) 
  v: 4.6 Mesa 20.2.2 direct render: Yes 
Audio:
  Device-1: Intel CM238 HD Audio driver: snd_hda_intel v: kernel 
  bus ID: 00:1f.3 chip ID: 8086:a171 
  Device-2: AMD Polaris 22 HDMI Audio vendor: Intel driver: snd_hda_intel 
  v: kernel bus ID: 01:00.1 chip ID: 1002:ab08 
  Device-3: Logitech Logitech StreamCam type: USB 
  driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus ID: 6-1:2 
  chip ID: 046d:0893 serial: <filter> 
  Sound Server: ALSA v: k5.9.10-1-MANJARO 
Network:
  Device-1: Intel Ethernet I219-LM driver: e1000e v: kernel port: f040 
  bus ID: 00:1f.6 chip ID: 8086:15b7 
  IF: eno1 state: up speed: 100 Mbps duplex: full mac: <filter> 
  Device-2: Intel I210 Gigabit Network driver: igb v: kernel port: b000 
  bus ID: 05:00.0 chip ID: 8086:157b 
  IF: enp5s0 state: down mac: <filter> 
  Device-3: Intel Wireless 8265 / 8275 driver: iwlwifi v: kernel port: b000 
  bus ID: 06:00.0 chip ID: 8086:24fd 
  IF: wlp6s0 state: down mac: <filter> 
Drives:
  Local Storage: total: 953.87 GiB used: 297.84 GiB (31.2%) 
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 PRO 1TB size: 953.87 GiB 
  block size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 
  serial: <filter> rev: 1B2QEXP7 temp: 51 Celsius C scheme: GPT 
  SMART: yes health: PASSED on: 176d 12h cycles: 531 
  read-units: 8,616,109 [4.41 TB] written-units: 8,994,873 [4.60 TB] 
Partition:
  ID-1: / raw size: 953.57 GiB size: 937.61 GiB (98.33%) 
  used: 297.84 GiB (31.8%) fs: ext4 block size: 4096 B dev: /dev/nvme0n1p2 
Swap:
  Kernel: swappiness: 60 (default) cache pressure: 100 (default) 
  ID-1: swap-1 type: file size: 32.00 GiB used: 0 KiB (0.0%) priority: -2 
  file: /swapfile 
Sensors:
  System Temperatures: cpu: 62.5 C mobo: 29.8 C gpu: amdgpu temp: 46.0 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 293 Uptime: 31m Memory: 31.28 GiB used: 4.36 GiB (13.9%) 
  Init: systemd v: 246 Compilers: gcc: 10.2.0 alt: 8/9 Packages: pacman: 1814 
  lib: 432 Shell: Zsh (sudo) v: 5.8 running in: gnome-terminal inxi: 3.1.08

Hi @Feakster :wink:

At least it should be displayed, even if it is dead.

What says:

journalctl -b0 -g nvme[0-9]n[0-9] --no-pager

or

journalctl -b0 -g nvme[0-9]n[0-9]p[0-9] --no-pager

Maybe check also if it is displayed at the UEFI.

Maybe a loose contact if it is M.2 ?

Have you checked that the cables are properly seated in the drive and motherboard? Should be able to see it in you drive boot selection screen in your UEFI/BIOS as well.

It is M.2

Will take it apart in an hour or so to take a look.

The regex doesn’t work in those commands. are they missing an option?

I get the following output from journalctl -b0 -g nvme --no-pager

-- Logs begin at Wed 2020-04-29 10:28:20 BST, end at Wed 2020-11-25 13:01:46 GMT. --
Nov 25 12:04:01 Haku kernel: nvme nvme0: pci function 0000:72:00.0
Nov 25 12:04:01 Haku kernel: nvme nvme0: missing or invalid SUBNQN field.
Nov 25 12:04:01 Haku kernel: nvme nvme0: Shutdown timeout set to 8 seconds
Nov 25 12:04:01 Haku kernel: nvme nvme0: 8/0/0 default/read/poll queues
Nov 25 12:04:01 Haku kernel:  nvme0n1: p1 p2
Nov 25 12:04:01 Haku kernel: EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null)
Nov 25 12:04:01 Haku kernel: EXT4-fs (nvme0n1p2): re-mounted. Opts: (null)
Nov 25 12:04:02 Haku systemd-fsck[423]: /dev/nvme0n1p1: 6 files, 70/76646 clusters
Nov 25 12:20:53 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.224' (uid=0 pid=3837 comm="sudo smartctl --info /dev/nvme0n1 ")
Nov 25 12:20:53 Haku sudo[3837]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl --info /dev/nvme0n1
Nov 25 12:21:12 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.226' (uid=0 pid=3901 comm="sudo smartctl --info /dev/nvme0n1 ")
Nov 25 12:21:12 Haku sudo[3901]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl --info /dev/nvme0n1
Nov 25 12:21:13 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.227' (uid=0 pid=3905 comm="sudo smartctl --info /dev/nvme0n2 ")
Nov 25 12:21:13 Haku sudo[3905]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl --info /dev/nvme0n2
Nov 25 12:21:15 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.228' (uid=0 pid=3909 comm="sudo smartctl --info /dev/nvme0n1 ")
Nov 25 12:21:15 Haku sudo[3909]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl --info /dev/nvme0n1
Nov 25 12:21:53 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.229' (uid=0 pid=3920 comm="sudo smartctl --info /dev/nvme0n1 ")
Nov 25 12:21:53 Haku sudo[3920]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl --info /dev/nvme0n1
Nov 25 12:22:20 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.230' (uid=0 pid=3940 comm="sudo smartctl --smart=on /dev/nvme0n1 ")
Nov 25 12:22:20 Haku sudo[3940]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl --smart=on /dev/nvme0n1
Nov 25 12:22:48 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.232' (uid=0 pid=3964 comm="sudo smartctl -a /dev/nvme0n1 ")
Nov 25 12:22:48 Haku sudo[3964]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl -a /dev/nvme0n1
Nov 25 12:25:44 Haku smartd[4300]: Device: /dev/nvme0, opened
Nov 25 12:25:44 Haku smartd[4300]: Device: /dev/nvme0, Samsung SSD 970 PRO 1TB, S/N:S462NF0M317898J, FW:1B2QEXP7, 1.02 TB
Nov 25 12:25:44 Haku smartd[4300]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Nov 25 12:25:44 Haku smartd[4300]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Nov 25 12:39:35 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.332' (uid=0 pid=6071 comm="sudo smartctl -t /dev/nvme0n1 ")
Nov 25 12:39:35 Haku sudo[6071]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl -t /dev/nvme0n1
Nov 25 12:39:58 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.333' (uid=0 pid=6075 comm="sudo smartctl -t short /dev/nvme0n1 ")
Nov 25 12:39:58 Haku sudo[6075]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl -t short /dev/nvme0n1
Nov 25 12:40:14 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.334' (uid=0 pid=6079 comm="sudo smartctl -H /dev/nvme0n1 ")
Nov 25 12:40:14 Haku sudo[6079]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl -H /dev/nvme0n1
Nov 25 12:40:26 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.335' (uid=0 pid=6092 comm="sudo smartctl -t long /dev/nvme0n1 ")
Nov 25 12:40:26 Haku sudo[6092]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl -t long /dev/nvme0n1
Nov 25 12:40:34 Haku dbus-daemon[502]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.336' (uid=0 pid=6096 comm="sudo smartctl -H /dev/nvme0n1 ")
Nov 25 12:40:34 Haku sudo[6096]: benjamin : TTY=pts/0 ; PWD=/home/benjamin ; USER=root ; COMMAND=/usr/bin/smartctl -H /dev/nvme0n1

Some weird output here as I just started faffing with smartctl.

Sorry i have no nvme… but this works for me:

journalctl -b0 -g sd[a-z] --no-pager

Don’t think that would work as both internal drives are nvme. I’ll check later to see whether the drive has come unseated, then report back.

Must be then related to the hardware or firmware. There is nothing weird about smartctl. It just monitors the nvme.

sudo smartctl -A /dev/nvme(x) ?

You could always check if it is recognised in UEFI. If not and it is well seated you might have a critical failure of the drive. First you should crosscheck though if simply the m.2 slot has failed and do a switch with your main drive. Then check if it recognised by UEFI. If not a RMA might be in order if warranty still applies.

How do you check whether it’s recognised in the UEFI?

Generally all drives get listed in UEFI. Where exactly depends on the manufacturer of your mainboard.

Thank you. Opened the case, removed the SSDs, blew dust away from the connectors with compressed air, then swapped over the drives. Booted and mounts as before now.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.