Random file system corruption (Not hardware issue)

Likkez · 11 February 2023 11:08

Hi! I have a problem where when I write files sometimes the whole folder where I wrote the files becomes ‘Read Only’ and some of the files disappear. This happens very frequently and I don’t know why.
It is 100% not a hardware issue as it happens on multiple drives I have and the SMART on them is perfect. I’m so sick of losing hours and hours of progess I don’t understand why this happens. Please help!

stephane · 11 February 2023 11:20

can ou report

sudo parted -l
df -Th 
sudo journalctl  -p3

linux-aarhus · 11 February 2023 11:30

filesystem issues is not an exact science - but usually it boils down to

hardware
filesystem
configuration

filesystem permissions doesn’t change - just like that … for that to happen the kernel has to have experienced a lot of write errors - which may trigger this - but it is rare

If the files disappear they may have only existed in cache and if cache has been corrupted the cache is cleared

Since we are at the cache - cache is memory and if your RAM is generating the errors - then it may explain why files disappear.

Likkez · 11 February 2023 11:32

sudo parted -l

Model: ATA Samsung SSD 860 (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1075MB  1074MB  fat32              boot, esp
 2      1075MB  64.0GB  62.9GB  ext4
 3      64.0GB  1000GB  936GB


Model: ATA WDC WD30EZRZ-00G (scsi)
Disk /dev/sdb: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name          Flags
 1      1049kB  2463GB  2463GB
 2      2463GB  3001GB  538GB   ntfs         WindowsFiles  msftdata


Model: ATA WDC WD40EFAX-68J (scsi)
Disk /dev/sdc: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  4001GB  4001GB


Model: ATA WDC WD40EFAX-68J (scsi)
Disk /dev/sdd: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  4001GB  4001GB


Model: Seagate USB (scsi)
Disk /dev/sde: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1000GB  1000GB


Model: ADATA SX8200PNP (nvme)
Disk /dev/nvme0n1: 2048GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2048GB  2048GB  ext4


Error: /dev/md0: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md0: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:

Model: NE-256 (nvme)
Disk /dev/nvme1n1: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End    Size   File system  Name                  Flags
 1      1049kB  256GB  256GB  ntfs         Basic data partition  msftdata

df - Th

Filesystem             Type      Size  Used Avail Use% Mounted on
dev                    devtmpfs   63G     0   63G   0% /dev
run                    tmpfs      63G  2.1M   63G   1% /run
/dev/sda2              ext4       58G   47G  7.8G  86% /
tmpfs                  tmpfs      63G  106M   63G   1% /dev/shm
tmpfs                  tmpfs      63G   25M   63G   1% /tmp
/dev/sda1              vfat     1022M   26M  997M   3% /boot/efi
/dev/mapper/ehome      ext4      858G  597G  218G  74% /home
/dev/mapper/Main2      ext4      2.3T  1.8T  321G  86% /media/manjaro/Main2
/dev/mapper/Storage4tb ext4      3.6T  2.4T  1.1T  70% /media/manjaro/Storage4tb
tmpfs                  tmpfs      13G   64K   13G   1% /run/user/1000
/dev/nvme0n1p1         ext4      1.9T  974G  808G  55% /run/media/manjaro/NVME_2tb

sudo journalctl -p3
just returns internet browser coredump, probably unrelated

sudo journalctl -p1
Dec 27 02:58:12 dmitriy-allseries systemd[1]: Caught <SEGV> from unknown sender process.
Dec 27 02:58:12 dmitriy-allseries systemd[1]: Caught <SEGV>, dumped core as pid 174056.
Dec 27 02:58:12 dmitriy-allseries systemd[1]: Freezing execution.
-- Boot bc19503b6dd9450293f916cbe6044c1d --
Jan 28 21:08:21 dmitriy-allseries kernel: BUG: Bad page map in process qbittorrent  pte:1000000000000 pmd:fbf83e067
Jan 28 21:08:21 dmitriy-allseries kernel: addr:00007ea6a9f7e000 vm_flags:0c0000d1 anon_vma:0000000000000000 mapping:ffff946a1373b630 index:7d37e
Jan 28 21:08:21 dmitriy-allseries kernel: file:module-datasets.tar.zst fault:filemap_fault mmap:ext4_file_mmap [ext4] read_folio:ext4_read_folio [ext4]
Jan 28 21:08:22 dmitriy-allseries kernel: BUG: Bad rss-counter state mm:00000000913fda6e type:MM_SWAPENTS val:-1
-- Boot 78055f1ae1294a55afaa1709cff693ee --
Jan 31 16:21:14 dmitriy-allseries kernel: EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss!  (inode 58735447, error -30)
Jan 31 16:21:14 dmitriy-allseries kernel: EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss!  (inode 58735447, error -30)
Jan 31 16:21:14 dmitriy-allseries kernel: EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss!  (inode 58735447, error -30)
Jan 31 16:21:14 dmitriy-allseries kernel: EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss!  (inode 58735447, error -30)
Jan 31 16:21:14 dmitriy-allseries kernel: EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss!  (inode 58735447, error -30)
-- Boot ccf9b7539a614f88a6cef9c16e8d6dfe --
Feb 09 20:56:21 dmitriy-allseries kernel: Fixing recursive fault but reboot is needed!
Feb 09 20:56:21 dmitriy-allseries kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
-- Boot 2b594b010a534f9e835cd5c6aafe29a8 --
Feb 10 18:08:56 dmitriy-allseries sudo[5500]:  manjaro : 1 incorrect password attempt ; TTY=pts/0 ; PWD=/home/manjaro ; USER=root ; COMMAND=/usr/bin/dmesg

linux-aarhus · 11 February 2023 11:34

As you stated

The other explanation is RAM

Likkez · 11 February 2023 11:37

Perhaps it could be RAM yea. Is there any way to tell for sure?

linux-aarhus · 11 February 2023 11:37

reboot and run memtest+ from grub

That is understandable.

Most users think that sourcecode management tools like git is only for coding - it’s not - it can be used for all kinds of versioning - especially where disk content changes often.

One has to consider if the data one is working with - if they contain proprietary stuff or personal information.

Likkez · 11 February 2023 12:16

I cant really use git when it’s terrabytes of training data for AI. Also I think it was ram, I ran the test and it says Fail.

Likkez · 11 February 2023 12:19

It was probably a faulty XMP profile combined with the retarded nvidia heatsink design that pushes all hot air right onto ram sticks

TriMoon · 11 February 2023 12:28

Likkez:

/dev/mapper/ehome      ext4      858G  597G  218G  74% /home
/dev/mapper/Main2      ext4      2.3T  1.8T  321G  86% /media/manjaro/Main2
/dev/mapper/Storage4tb ext4      3.6T  2.4T  1.1T  70% /media/manjaro/Storage4tb
tmpfs                  tmpfs      13G   64K   13G   1% /run/user/1000
/dev/nvme0n1p1         ext4      1.9T  974G  808G  55% /run/media/manjaro/NVME_2tb

It would be nice to know where those /dev/mapper/xxx devices are mounted from, eg. what their backing file system locations are…fe:

Shows he has mounted his SSD inside the /run dir which is by default a tmpfs eg a directory inside RAM, which is ofcourse a BAD mount point…

Likkez · 11 February 2023 13:38

I unmounted it after it got corruption. Thunar mounts stuff under /run/media by default if u click on a disk

TriMoon · 11 February 2023 13:42

No matter if thunar does that, it’s still bad, it should mount under /media/username/xxx…
Anyhow how about the info about the /dev/mapper/xxx mounts…
(Plus you never told us on which filesystem / mount-point the problem occurred…)

system · 14 February 2023 03:42

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.