Is BTRFS unstable? (Update: It's not unstable, the problem is probably something else)

veprovina · 19 January 2023 10:19

Well, memtest ran for 6 hours, 4 passes, 0 errors… I know i didn’t test each stick individually, but if any were showing errors, then i’d decouple them and find out which one it is. Both seem fine so i don’t think i have to do that.

Now what? How do i test mainboard bus system?

Zesko · 19 January 2023 10:35

Try to install mprime-bin from AUR

https://wiki.archlinux.org/title/Stress_testing

If no error then just use back normal until btrfs-desktop-notification-git helps you notice when new Btrfs warn or error message pops up.

journalctl-desktop-notification-git does the same and is general to everyone .
But btrfs-desktop-notification-git is special for btrfs only.

veprovina · 19 January 2023 10:39

Ok, thanks, i’ll try it.

veprovina · 19 January 2023 14:06

Well, i ran the stress test for a while, didn’t seem to be any issues whatsoever…
Weird.

omano · 19 January 2023 16:14

Which stress test did you run, and how?

Also, an obvious thing has been pointed to you early in the thread

At some point stop ignoring the first thing you should have tried:

INSTALL A MANJARO KERNEL NOW

When you’ll have done extensive tests with what comes with Manjaro, and find that you continue to see the exact same problematic behavior, you can then say it is not because of you external modified kernel that you see issues apparently only you have here on the forum.

Use logic it helps not wasting time. It may not be that, but why are you forcefully ignoring that obvious thing, I can’t tell…

You need to trim the big pieces first before detailing.

veprovina · 19 January 2023 22:47

mprime-bin, left it on default options for half an hour, didn’t report any issues.
I’m not comfortable leaving stress tests on for longer so i stopped it.

I asked in this thread if it’s kernel related, people have told me no, so i didn’t.
But yeah, i guess i can try it.

No need to shout.

I’m definitely not the only one having issues with this, and this was on an official Manjaro kernel, though, granted, an older one. It was perfectly safe to assume the custom kernel wasn’t at fault here.
This exact thread actually solved one of the issues when this happened before. When some folder became not writeable due to permissions changing for no reason.

Because it’s not obvious it’s the kernel. I’m perfectly happy testing the normal one, but all evidence pointed to it not being the case, as others have stated here as well.

omano · 19 January 2023 23:04

To me as it is not from Manjaro, and heavily modified ‘tuned’ kernel, to me this is obvious to go back with default kernel to rule that out.

Do Large FTT test for the RAM with mprime if you have an error you have a RAM issue.

veprovina · 19 January 2023 23:14

I switched to the default kernel, we’ll see. I won’t switch back to xanmod if everything works fine from now on.

I tested memory with memtest for 6 hours, if any of the modules were faulty i the error would probably show by then. But i guess i can do one more, but later, need my computer now.

omano · 27 January 2023 12:42

Asking back a week later to see if you had time to do more tests and check if you have same issue with Manjaro kernel? Any news?

eugen-b · 27 January 2023 14:21

A different test idea is to copy your current storage to a different one with a different filesystem type, make it bootable and try running on that one for a month.
If no issues, then profit for you a one more brick in the wall for btrfs.
If issues continue, then continue debugging.

veprovina · 28 January 2023 02:58

No issues so far. I once had that annoying KDE Unlock bug, but that’s not related. And that notification system for BTRFS that i installed didn’t show up once, and i copied about 22 GB of pictures from my mobile to the drive, no issues.

I did remove xanmod, so maybe it was that kernel related after all.

I do have another SSD where WIndows used to be, but i’m too lazy to sift through it to see what’s there, and if i just copy everything to a folder and name it “backup” (lol) it shall forever remain untouched.
But i don’t want to install the OS and configure it again, and chances are that, if i just run stock and browse internet on it, nothing will ever happen. A true test would be to do the exact same things i did on this install, then see if there’s a difference. But that’s too much time to re-do i’d like to avoid that if i can.

omano · 28 January 2023 13:17

Let’s hope it was that and your issue is gone for good

veprovina · 28 January 2023 15:11

Yeah! But also, there was recently some huge update where almost every package got updated, so the issue might have been fixed in one of those without me knowing…
In any case, it seems BTRFS, ssd health or memory are not the issue, and that’s what’s important.
I doubt it’s the motherboard as well, i would have had way more problems if it’s that.

So far so good in any case.

veprovina · 9 February 2023 15:49

Well, that btrfs notification popped up that i installed:
It told me to check dmesg for grep, so i wrote sudo dmesg | grep btrfs in the terminal.

[ 6536.342454] BTRFS: error (device nvme0n1p2) in btrfs_commit_transaction:2447: errno=-5 IO failure (Error while writing out transaction)
[ 6536.343495] BTRFS: error (device nvme0n1p2: state EA) in btrfs_sync_log:3187: errno=-5 IO failure

Disk is healthy, memory is not corrupted memtest is ok, everything works except btrfs…
I’m even using the default kernel now. I removed xanmod.
I’m going to finish the work i have left for today, back up everything, then i’m reinstalling with ext4…
I’m not looking forward to another qemu GPU passthrough setup but whatever lol…

Zesko · 9 February 2023 17:31

Be careful, smartctl and memtest have some limits.

I would suggest to test copying any large file. you need to test repeats more than 10 times:

Create any file larger than 10 GB+ on another disk/partition and a checksum e.g. sha1sum.
Copy this file to your bad Btrfs partition nvme0n1p2
Check the file’s checksum if it matches correctly.
delete this file, then copying it again.
repeat copy and verification more than 10 times if you see a btrfs notification popping up.

veprovina · 10 February 2023 00:56

I know but, memtest ran for like, 6+ hours over night, did a bunch of passes, it would have detected an error at least once during that time, no?

Yeah, i’ll do that before i try and reinstall everything.

EDIT: Sorry, how do i check that?

andreas85 · 10 February 2023 06:58

md5sum --help

md5sum
checksum

Zesko · 10 February 2023 08:08

No, in my experience I had run memtest 3 times more than 14 hours+, all passed. But manually copying and checking a large file found the error easier in a short time less than 1 hour.

I had sent the faulty RAM to the official PassMark company (Memtest was made by PassMark) in Australia. PassMark tested it with memtest PRO more than 30 repetitions and found a binary error in the RAM after 2 months.

That is why my suggestion would be to copy and verify a large file on RAM, disk and CPU cache more efficiently than memtest. If you have 32GB RAM, try copying and verifying a 20 GB+ file.

$ sha1sum {Your_file}

It reads any large file from disk and does a lot of calculations using CPU cache and RAM, probably to catch an error.

Copy a large file: that is, reading/loading it from another disk/partition → RAM → write it to your disk/partition, probably catching an I/O error.

veprovina · 10 February 2023 12:31

Well, just in case, i ran memtest again, now for about 9 hours, 6 passes, still no errors.

About the file transfer. I formatted the second SSD to btrfs, and put a 65GB file on it.
So i first check sha1sum on that file, copy it, then check the copy if it matches. Got it.
Then repeat 9 more times.

Then if there’s an error, that means it’s RAM?

andreas85 · 10 February 2023 12:51

veprovina:

Well, that btrfs notification popped up that i installed:
It told me to check dmesg for grep, so i wrote sudo dmesg | grep btrfs in the terminal.
[ 6536.342454] BTRFS: error (device nvme0n1p2) in btrfs_commit_transaction:2447: errno=-5 IO failure (Error while writing out transaction)
[ 6536.343495] BTRFS: error (device nvme0n1p2: state EA) in btrfs_sync_log:3187: errno=-5 IO failure
Disk is healthy, memory is not corrupted memtest is ok, everything works except btrfs…

Have you ever heard of: