dmraid bug with trim in kernel 5.1, 5.2

Linux Kernels 5.1, 5.2 - Critical bug found, may lead to data loss

May 22nd, 2019: Fixed in version 5.1.4

May 21st, 2019: A critical bug has been found in Linux 5.1. (edit: and possibly 5.2)
It is related to TRIM commands issued on SSDs.

It is unclear who is affected exactly, as information given on sources doesn't match well. It may only affect installs using encryption. It may only affect certain models of SSDs. Testing is underway. However, if you use an SSD, you are unsure if you are affected or not and you do not want to take any chance, we strongly recommend that you avoid using the 5.1 and 5.2 kernel for now.


https://bugs.archlinux.org/task/62693

https://www.reddit.com/r/archlinux/comments/brbvo7/psa_fstrim_discarding_too_many_or_wrong_blocks_on/

30 Likes

I moved this into #announcements as it's probably quite important.

Edits will happen as more information becomes available.

6 Likes

Yes. It now looks as if it may be dm-crypt specific.

2 Likes

On Arch Linux bug tracker: https://bugs.archlinux.org/task/62693

For now, all people having this issues used LVM, dm-crypt and some Samsung SSD.

That is pretty specific... for now. The problem is that it isn't confirmed if it will only affect systems that use that very specific combination of software+hardware.

Also, the information among sources doesn't exactly match well. On the bug tracker, it specifies LVM+dm-crypt+Samsung SSD; and on reddit, it tells dm-crypt/LUKS or device-mapper/LVM without specifying if it only affects specific SSDs or if it affect pretty much any SSDs in general.

Well, it is a mess. Now that I think of it, I don't understand what's going on at all exactly.

5 Likes

It looks like this is also in 5.2rc1 as well.

i understand then that the bug only effects ssd's using encryption?

Looks that way, but testing isnt concluded yet.

1 Like

Thanks for pointing this out. I will take the cautious route and drop back to 5.0.

1 Like

issue seems fixed based off the status of that report. @jonathon

Maybe on Arch Linux side (according to their bug tracker), but not on our side.

Unfortunately, it is obscure at how it has been fixed, or rather it is not really explained in detail. The bug tracker only say this:

Additional comments about closing: linux 5.1.3.arch2-1

No hint on how it is actually fixed, i.e. which part of the kernel got patched, by looking at the build files or even looking at their Git that hosts their version of the Linux kernel.

https://git.archlinux.org/svntogit/packages.git/log/trunk?h=packages/linux
https://git.archlinux.org/linux.git/log/

If someone finds out how it can be fixed, it would be good to know.

That feeling when you switched from LUKS/LVM to LUKS only about a week ago :sweat_smile:

1 Like

@jonathon @Frog

Thank you. For now, I will watch this. I use LUKS/dm-crypt on an old Samsung SSD (MZ7TD1280) of the PM840/840 generation.Not sure if it supports TRIM, though. I know it was a bare drive supplied to Lenovo so no support from Samsung and none from Lenovo as it's long past warranty. Not sure, but I think it was in the T410-T430, T510-T530 range. Works fine in my T61p though.

Why does that matter? Unless, is manjaro not close enough to it's parent OS (Arch) to be able to just pull in the patch? Did it mess up Manjaro Unstable when it was pulled?

We dont get our kernels direct from Arch. We have our own with our own patches.
Right now its not easily evident what exactly they did to call it 'closed'
The 2 obvious things from the information we have would be
A - reverting the specific commit that we think is the source of the issue
B - the extremely minimally tested patch thats already been posted

Oh that's right I forgot. Well hopefully the main team at kernel is aware of the issue although the fact that the Arch team fixed it makes me wonder if the bug is specific to Arch and Arch-based distros,

The initial spotting came from redhat people :wink:

1 Like

Ok good so hopefully the main kernel team will fix it soon.

For now they did this:

3 Likes

also this from fedora
https://bugzilla.redhat.com/show_bug.cgi?id=1708315

Thanks for the head up.

Just to be sure now booting with 5.0, fstrim.timer service also disabled until the prooesed Red Hat patch linked in the Arch bug report is fully tested and possibly rolled out.

This leads to massive data loss

Ouch.

1 Like