SSD turns off and BTRFS goes into a read-only mode

Two weeks ago, my laptop’s SSD suddenly powered off and BTRFS went into a read only mode. It haven’t happened since then, but I’m looking to find out what has caused it.

Though my initial hunch is temperature of the laptop, as I was watching 4k video content on YT for 4 hours straight that day, but I quickly ruled it out because the entire room is under air conditioning set at 18C and my laptop is cool to touch after that, also it is constantly on a laptop cooling pad.

Here are some images I took while on the TTY.

I immediately went into a TTY and it is evident that BTRFS has went into a read-only mode.

So I checked the lsblk command to see the drive, and this is what I saw.

Definitely my NVME SSD went offline, and this is the SSD which has my home partition. This is how it should show normally.

image

Looking into dmesg I figured out what is the actual issue here. There was some issue with SSD, there were some write errors, and then there is a line saying that trim operation has failed. My SSD is Crucial P2 1TB 3D NAND NVMe PCIe M.2 SSD Up to 2400MB/s - CT1000P2SSD8.

I searched for more information on the Crucial’s page regarding TRIM, and here is what I have found.

What is SSD Trim?

TRIM is a command for the ATA interface. As you use your drive, changing and deleting information, the SSD needs to make sure that invalid information is deleted and that space is available for new information to be written. Trim tells your SSD which pieces of data can be erased.

The command is different for other interfaces, and goes by different names in different operating systems, but the action is usually referred to as “Trim”. No matter what name it goes by, Trim works with Active Garbage Collection to clean up and organize your solid state drive. Trim is beneficial, but not mandatory. Because some operating systems do not support Trim, SSD manufacturers design, create, and test their drives assuming that Trim will not be used.

Isn’t this for SATA SSDs then? My SSD is NVME and uses PCI to connect to the motherboard. So does it mean TRIM won’t be possible in my SSD case? As per my limited understanding, the equivalent operation should be DEALLOCATE and not TRIM for NVME SSDs.

After searching for a while, I have learnt about the existence of fstrim and its timer service, so I have checked the status of it too. To my surprise, it is actually inactive. However, the timer service is indeed active.

I’m also attaching the relevant part of dmesg output here. Since I couldn’t save the output from TTY I did it after a reboot using the command journalctl -k -b -1 -o short-precise >> dmesg_output.txt.

What should be my next step? Should I just disable this timer service and see what happens?

Btw, my kernel is 5.17.1-3-MANJARO.

Additionally, there is a slightly unrelated issue regarding SSD too. My SSD SMART status is always Good and there are no attributes. Also what confuses me is powered on for and power cycles are always 0. This is the same for built in SSD too which came with the laptop.

Welcome to Manjaro! :smiling_face_with_three_hearts:

  1. Please read the information behind this link. It will help you to post necessary information. [HowTo] Provide System Information
  2. Please press the three dots below your post and then press the :pencil2:
  • If you give us information about your system, we can see what we’re talking about and make better suggestions.
  • You can do this by using inxi in a terminal or in console.
inxi --admin --verbosity=7 --filter --no-host --width
  • Personally identifiable information such as serial numbers and MAC addresses are filtered out by this command
  • Presenting the information in this way allows everyone to be familiar with the format and quickly find the items they need without missing anything.
  1. Copy the output from inxi (including the command) and paste it into your post.
  • To make it more readable, add 3 backticks ``` on an extra line before and after the pasted text.
nvme nvme0 device not ready; aborting reset, csts=0x1

Looks like related to this issue: 215081 – nvme: Device not ready; aborting reset, CSTS=0x1 - Removing after probe failure status: -19

So the nvme is not ready and stays in power saving mode.

You need to disable power managment. Add nvme_core.default_ps_max_latency_us=0 to the kernel parameter, or any other value than 0 which fits to the nvme.

Also maybe you need to check the file system for damages:

sudo btrfs check --readonly --progress /
1 Like