Re: Ntfs3 keeps corrupting my ntfs partitons

Has OP tried ntfs-3g? GitHub - tuxera/ntfs-3g: NTFS-3G Safe Read/Write NTFS Driver It is in the Manjaro repo.
I’ve been using it for years. For example to move Linux distros and textfiles. Have not noticed any corruption and it should be pretty obvious if it did happen. Distro ISO that is corrupted, well, of course it wouldn’t work. And text-files I assume would get weird letters or symbols. Hard to miss.
For at least a decade I’ve had NTFS and Ext4 partitions and moving files from Ext4 to NTFS.

@linux-aarhus thanks for the recommendation but FAT (no matter which of the versions that are covered by vfat) isn’t able to deliver what I want or need. Hardlinks are a hard minimum requirement for me, the ability to have reparse points on the Windows side is nice (but not mandatory, of course), too. Data safety via a journaled FS is also something I expect these days. To the best of my knowledge none of these are offered by any FAT version supported under the moniker vfat. exFAT is barely better.

My main misconception was that the ntfs3 driver would perform at least as well (not just speed-wise) as the commercial counterpart. And when I had contacted the Paragon support months ago they practically assured me that their performance (ntfs3 vs. ufsd) is on par. This is clearly not the case so if and when it becomes available for a recent Linux kernel I may either try their ufsd driver again and until then I will settle for ntfs-3g.

@zhongsiu yes he has :wink:, and that’s what I am settling for (mentioned here), for now anyway. But I was used to the better performance of ufsd (Paragon’s commercial offering) before and had hoped that the ntfs3 driver contributed by them would perform at the same level. It does not, so I am reverting to the time-proven solution as you suggest, even if it means certain features that I rarely expect on the Linux side won’t be available. Since I also have some Linux setups where I don’t own the machine (and so can’t use my commercial license), I had ntfs-3g also in use on some of those machines and also never encountered data corruptions. The most obvious shortcoming of ntfs-3g is the inability to remount, because I use that a lot. It always takes a umount followed by a mount instead, which also means I have to be able to umount in the first place (remounting something rw or ro however, works without issue even when the mount point is technically “busy”).

Thanks everyone. I was less looking for a solution (I am by now convinced it’s a defect in the ntfs3 driver) than I wanted to make sure to add what bits of information I had gathered in addition to the info already out there.

Yes - there is some limitations with the FAT file system - but with the exFat version (file size > 4GiB) it is quite reliable for platform data exchange.

ntfs drivers outside Windows will always be a reverse engineering of a proprietary file system - with all the uncertainty that brings.

But hey - that is your prerogative

On a personal opinion I ditched Windows as my primary system years ago. The only place I use Windows is a win10 vm running Visual Studio for maintaining a .NET4/MSSQL backend running on IIS10.

2 Likes

Allow me to chip in here: it is not a matter of prerogative! You can’t offer a file system driver that looses data. The ntfs3 driver is faulty, at least according to my and Vidarr and other users’ findings.

Before other users find out that they lost their data, it would be wise to temporarily disable or remove the ntfs3 driver until the issue has been fixed. Anything else would be deliberately causing data corruption, with all its consequences.

2 Likes

I most certainly is - you choose to use it - NTFS - despite the risks - this is your choice.

May I remind everyone, the Linux kernel is GPL

  • read on to understand the implications
  • in this context you have no right to complain or demand …

https://www.gnu.org/licenses/gpl-3.0.en.html

https://www.gnu.org/licenses/gpl-faq.html

2 Likes

Aside:- Some torrent applications only move finished file(s) to the target directory. In the interim, while files are downloading, they remain in a subdirectory which is often hidden. This could also explain your experience, based solely on these quoted portions of your post. qBittorrent, for example, does this (manually configurable).

Interestingly, this ‘corporation that made working on file systems for almost 30 years its core business’ most likely grew from the initial efforts of one man working on a driver from scratch.

It’s also possible a driver might not have fundamentally changed in 30 years, and the focus remains on a yearly redesign of the GUI for marketing purposes.

:endofmusings

This might be of related interest:

Cheers.

Apologies, I didn’t initially notice this thread is getting a little long-in-the-tooth.

2 Likes

As of late, so to speak. Depending on the distro (and no, I am not solely using Manjaro where I get the latest stuff), exFAT can be an issue. Not by itself, but once you need to work with a file system that wasn’t cleanly unmounted etc.

However, since Microsoft opened up the specification for exFAT in 2019, I guess it is the safer bet for the future.

While that is true, having worked with the commercially offered driver by Paragon before, I was quite surprised to find that the upstreamed one was causing such issues.

As I understood it the company did a rewrite for this driver whereas the commercial one was adapted from their original NTFS for DOS driver. But still they literally have decades of experience with NTFS and must have accumulated many many test cases (which they claim both drivers run against). And since they don’t keep pace with the Linux kernel the way Manjaro does, the commercial driver simply won’t build and link against the kernels that I get with latest Manjaro (stable).

On the other hand it feels like the “fairly solidly experimental” aspect that Mr. Torvalds so candidly voiced doesn’t come exactly out when consuming the kernel through a distro.

I get your points and also the point about the GPL is well understood - although not everyone is a techie and not all techies “speak” C etc. (and even techies only get 24 hours per day :wink:) - which is why I posted this as a warning to other users. My “complaint”, if you will, is less about the fact that the driver may not “be there” in terms of data integrity and so on, rather than how this isn’t communicated when you start using it. Btw, the Linux kernel is under GPL2, not 3.

Either way I am grateful you and others have shared their experience and advice. Thank you!

@soundofthunder thank you for pointing that out. Perhaps qBittorrent would then be a good way to reproduce this. I shall give it a try in a VM, perhaps with different versions of the kernel/driver. On that other point: according to Paragon themselves, the drivers don’t share the same lineage (the GPL’d one seems to be a rewrite, whereas the commercial one traces its roots to their NTFS for DOS driver).

I’m aware of the licenses. I am also very aware of the difficulties for developers to develop NTFS drivers under Linux.
Where I do not agree with is how lax this serious bug is dealt with. The whole world was upside down when the xz package contained malicious code. Github went as far as to ban the original maintainer, although he was just a victim. Yet the xz vulnerability hardly affected any user (luckily it seems to have been caught in time). The original maintainer and the Linux community reacted quickly - kudos!

The ntfs3 kernel driver has been repeatedly reported to loose files. The OP mentioned a bug report on the Ubuntu launchpad site. Following half a year of not receiving a response, the person who reported the bug posted the following comment in German, which I translate:
“Apparently. My bug report is more than six months old, has given me an insane amount of work, and doesn’t interest a single specimen of the species porcus. After all, it’s just about data loss; there are much bigger problems. Utterly ridiculous.”

I have not added my bug report to the Ubuntu site since I use Manjaro. Which brings me to the question: where do you report bugs on Manjaro? Is it “anywhere on this forum”, as suggested in the How to join Manjaro-Development section or how to report bugs? thread?

What is of no less concern to me is that I have not seen any announcement by the developer as to the bug, nor on whether or not it has been addressed / fixed. A two years old article in the Register does also not inspire confidence in the ntfs3 kernel driver.

Over at bugzilla.kernel.org, there is a list of reported ntfs3 issues but all are marked as “NEW”, with no “resolution”. This surely builds confidence.

Am I to understand that data loss is a minor issue? I guess we all have so much data, we can loose a little here and there.

As you pointed out so nicely by posting the GPLs, it is my choice to use or not to use the ntfs3 driver. It would, however, be useful to see a sign of life from the developer / maintainer like “yes, I saw the bug report” or “no, I don’t have time/cannot reproduce/lost interest/I’m sick”, in other words some form of acknowledgement and if there is a chance that the bug will be addressed.

Let me emphasize: The xz vulnerability was a mere package issue. The ntfs3 data loss is on the Linux kernel level.

I should point out that I have used the ntfs3 driver without issue for a fair while; the luck of the draw, I suppose. Sooner or later (as with most software) there will be an issue or two, a bug, perhaps. What I’ve noticed is that many of the so-called bug reports stem from ignorance of how ntfs3 actually works.

I sometimes see, for example:

The not-a-bug in this case, is that ntfs3 actively prevents mounting when a dirty bit is detected. The best way to solve that is to run chkdsk from within Windows to fix any filesystem errors and clear the dirty bit. In this example, the poster claims they “tried running, chkdsk from Windows, but that did not change anything”.

The likely scenario is that only the most basic scan was used chkdsk x: /f and a deeper (and more time intensive) scan was likely needed, yet not performed.

The point though, is that the mere fact of not being allowed to mount and access their ntfs volume translates as a bug to those unaware of the difference. This potentially wastes much time for developers; there is little wonder that many of these remain effectively ignored.

Of course, I can’t speak to the intent or otherwise of kernel developers, but at least from my user perspective, confidence is generally satisfactory with regard the ntfs module.

Just my 2 cents. Keep the change.

Cheers.

1 Like

Thanks for sharing your thoughts. In my case I received “Looks like your dir is corrupt” errors (checking with dmesg) that I could fix using chkdsk /F within Windows. See Vidarrs original post and the link to my blog entry.

Loosing data is considered a serious thing. Even if there is a way to recover it, you never know when it will actually become irrecoverable.

I used the ntfs3 drivers in scripts, so the only thing I had to change was a line in a script.

My NTFS drives typically hold between 5000 to 260,000 files so I’m not really inclined to use experimental stuff.

The problem can easily be solved by switching back to the ntfs-3g driver. Until I hear that the file/directory corruption or orphaned files bug has been fixed I consider the ntfs3 kernel driver unsafe. Unfortunately I haven’t found neither an acknowledgement of the bug by the maintainer/developer nor a release note that the bug has been fixed. Perhaps I have been looking in the wrong place?

May I ask what kernel version you use? The developer seems to improve the driver constantly: Commits · torvalds/linux · GitHub But yeah, the driver is considered stable, doesn’t mean it is production ready. When a kernel dev says “stable”, then he means “public beta”. LTS is intended for production.

However, I use linux 6.6 atm and I cannot produce any errors on my win11 partition, which was preinstalled. Could you tell how to produce such “corrupt errors”? Do you still use linux 5.15? Probably fixes were not back ported yet.

Thanks for asking.
Kernel: 6.6.26-1-MANJARO x86_64

I use NTFS partitions in two different ways:

  1. Native NTFS partitions on disk, for example all my external backup drives are formatted to NTFS.
  2. NTFS on LVM. My Windows 10 system disk and two data drives are created on logical volumes.

I use a bash script and a desktop launcher to mount my LVM-based NTFS drives under Linux.
Another script creates file hashes for all files (after the initial run only for the new files). This is where “input/output errors” showed up.

sudo dmesg | grep -i ntfs

showed “dm-27: ino 4070 “dir_name” Looks like your dir is corrupt” entries (see below screenshot). These errors appeared when working on NTFS external drives, those NOT using LVM. The explanation for this is simple: I use the aforementioned Linux backup script to mount the external drive in read-write mode to copy to and sometimes delete files on the external target drive. It is rare that I use Linux to copy files to my Windows system disk or the data disks. This probably prevented data corruption on those NTFS volumes.

Thanks for pointing to the github commits for the ntfs3 driver. Unfortunately I cannot correlate these fixes or updates to the errors I get. It would be good to have the bugzilla.kernel.org site updated to reflect the status of these bugs. Else, how would I - a layman - know if a problem has been addressed?

Is this suggestive of those directories having existed for 12 years?
ntfs3: dm-27: ivo=4070, "20120321" Looks like ...

Another common denominator is not only that the affected volume(s) have been exposed to the ntfs3 driver, but they also reside within LVM2 containers. Perhaps you shouldn’t rule that out as a contributing factor.

LVM on Windows is not LVM on Linux.

It is entirely possible that Microsofts implementation is not fully reverse engineered in the ntfs3 kernel driver.

Thanks for the comment:

  1. Most likely some files in these 12 year old directories were updated on my computer. The rsync backup script or utility would then copy those files to the external backup drive. After that I run the hash script to create or update the file hashes. That is when the errors appeared.
find . -type f -printf '%T+\t%s\t%i\t%p\n' > "$lof-new"

is the command in the script that produced the “input/output errors”.

  1. No, these external backup drives do not use LVM containers. However, the internal NTFS disks use LVM. As I explained, the errors are associated with the external drives. Windows’ chkdsk /F command found some “lost files” (orphaned files) and restored them.

Hope this helps.

I recall encountering similar directory and file corruption in years gone by, even when using the Microsoft driver. In my case, the disk was the culprit, despite it showing no obvious signs of any distress. SMART revealed it to be well within it’s service life, and undamaged, and yet the issue persisted.

Finally I performed a chkdsk scan for bad sectors; after successful completion the condition still existed, and eventually I performed a low-level format (took several days back then) and afterwards, no more issues became obvious.

Of course, that’s a different situation entirely. Frankly, I find it difficult to believe that ntfs3 is actually the cause - though, if it were the Microsoft driver, I might have little doubt.

I’m not using LVM on Windows! I create empty, unformatted LVM volumes under Linux and let Windows format the “partitions” using NTFS. Windows sees the LVM volume just as if it was a normal partition.

My hash scripts use the snapshot feature of LVM and don’t modify the NTFS partitions at all.

I’ve been using rsync or the rsync-based Luckybackup utility for more than a decade, together with the ntfs-3g driver. I had never any issue with that. Only after switching to the ntfs3 driver did I loose files. I have since reverted to the ntfs-3g driver and all looks fine.

Using my hash scripts also allows me to detect bit rot or file corruption, in addition to keeping track of all my files. If all goes well (which usually happens), after I backup my computer to remote or external drive(s), I know for sure that the backup contains an exact copy of the source (same number of files, same file names, same hashes for each file). The hashes, file names etc are stored on cloud-backed folder and also copied to the external / remote drives in question.

I have usually at least 3 backups.

erm… what backup script? I see only a mount script… How do you copy files? By hand with a file manager? Which one?

What I am asking: Please, give me exact instructions how to create such errors. Thank you.

dm-27 must be some sort of fakeraid, could be also Microsoft’s LVM. So probably ntfs3 with fakeraid, is a edge case scenario, so not well tested, meaning: this edge case is not included in paragon’s test scenarios, which are used for testing the driver on linux. Also what you are showing is not a “basic partition” in terms of Windows.

PS: Please stop using pictures of text of the terminal. There are code blocks and you can copy and paste them. Thanks. Now everyone has to copy your stuff by hand instead of being able to copy&paste it.

This topic is a great example of the importance of information.

  • NTFS

Only when challenged you provide extra information

  • logical volume - this is also a feature Windows provides

When challenge with the choice of using Windows logical volumes

  • we are informed about the creation of a LVM group on Linux then attached that to Windows and let Windows format the volume with NTFS

I think I have mentioned earlier the conflict of interest with Paragon Software.

When any software is reverse engineered there is an important rule - relating to copyright.

The process must documented so the developers can prove how they reached the result and that the code is not just a result of decompiling parts of the application - then use that source to create the functionality.

That process and result is required to be different than the original - otherwise you become a target of intellectual property infringement.

This means the derived code is likely to behave different and may therefore create issues.

When you choose to use NTFS as an important filesystem for sharing data between Windows and Linux - you must have your reasons - but complaining about issues in such a specialised configuration - you should ensure you do not apply updates without thinking of the consequenses.

Kernel 5.15 released 2021-12-31 implemented ntfs3 - even so it does not imply it is production ready - also mentioned before.

Thanks for your comments and questions. I didn’t want to hijack the thread, just share similar experiences as the OP. In fact, the OP commented on my website where I posted about the ntfs3 issue.

First in reply to @megavolt: Reading my own post on my website I noticed that I made some mistakes in my replies above, thanks to @megavolt pointing out the dm-27.

Here is the corrected version of events, as described in my post and as far as I can recall now:

  1. I have a LVM volume called media-photo_raw that is formatted to NTFS and used entirely for backing up the picture files as they are imported by Adobe Lightroom (the “make a second copy to…” option on the LR import screen). The drive had been formatted to NTFS within the Windows VM. Of course, the files were copied while running the Windows VM.
  2. After I import the photos from memory card to my NVMe-based work drive called vmvg-workdrive, I go through them and delete those that I don’t want to keep. However, I do NOT touch the files on the media-photo_raw volume holding the backup on import. Obviously the media-photo_raw volume fills up rapidly. Once the photos on the vmvg-workdrive are processed, I move them to long-term storage on large HDDs (LVM volume in RAID1 that’s been formatted to NTFS by Windows 10).
  3. After each step I run a hash script that creates a list of files and the corresponding hashes, for each storage media/volume.
  4. Over time, the original backup files are not needed anymore, as I have multiple backups of the long-term storage drives. So I wrote a script that deletes photos on the media-photo_raw volume based on various criteria to free up space.
  5. When I ran the script the first time, it deleted around 27,000 files on media-photo_raw.
  6. About a month later, after importing more photos, I ran another script to update/add the hashes for the new files. This is where I got the “input/output errors”. I checked it by executing: find . -type f -printf '%T+\t%s\t%i\t%p\n' > "$lof-new"
  7. At first I thought this was hardware failure of the HDD. I ran an extended SMART test and it came out clean. This is when I started to suspect the ntfs3 driver and did some Internet search to find that I am not alone.
    (Sorry for the lengthy introduction, but it might be relevant.)

Now, below are the answers to your questions:

What backup script? I use Luckybackup (basically a front-end for rsync), which uses a bash script I wrote to mount the LVM-based NTFS partition. The command that mounts the file system is: mount -t ntfs3 -o "$rw_mode",iocharset=utf8,dmask=027,fmask=137,uid=$(id -u $user),gid=$(id -g $user),discard "$mount_dev" "$mount_path"
Important: I did not use any backup script on the media-photo_raw in question.

Here is the code snippet of the bash script that deleted the 27,000 files:

cnt=0
while read file; do
	echo "Deleting $file"
	rm "$file"
	((cnt++))
done < "$lof-discard"
echo "$cnt files deleted from raw backup media"
echo "Removing $(find -depth -type d -empty | wc -l) empty folders"
find -depth -type d -empty -delete
echo "Removed empty folders";;

dm-27: This is the media-photo_raw LVM volume that was mounted by my script using the following mount option:

mount_op="-t ntfs3 -o ${rw},iocharset=utf8,dmask=027,fmask=137,uid=$user,gid=$group,discard"
mount ${mount_op} "${lvmount:-$lv_path}" "$mounton"

(About using photos: Totally agree!)

@linux-aarhus

NTFS - Only when challenged you provide extra information: I will gladly help where possible, but didn’t know if this here was the right place. To summarize my different NTFS use cases:

  1. LVM raw volume created by Linux, formatted by the Windows VM to NTFS.
  2. Native NTFS volume, typically a pre-formatted external HDD (like a WD Elements external drive).

Logical volume: I thought I made it clear that I use the Linux LVM. Microsoft doesn’t care much about open standards, so their proprietary version of LVM called “Storage Spaces” is a no-go for me. LVM stands for “Logical Volume Manager” and should be a familiar term to most Linux users.
As mentioned before, I use a mount script and a launcher to mount or unmount my NTFS volumes. The code that selects the right Windows partition to mount is here:

    kpartx -av "$vm_path"
    if [ -b "${vm_path}p1" ]; then
        dev="$(lsblk -no NAME,SIZE ${vm_path}p? | sort -h -k 2 | awk '{ print $(NF-1) }')"
        dev="$(echo $dev | awk '{ print $NF }')"
    else
        if [ -b "${vm_path}1" ]; then
            dev="$(lsblk -no NAME,SIZE ${vm_path}? | sort -h -k 2 | awk '{ print $(NF-1) }')"
            dev="$(echo $dev | awk '{ print $NF }')"
        else
            dev="$vm_volume"
        fi
    fi

As you can see, I use kpartx as device mapper (it supports even nested LVM volumes).

I think I have mentioned earlier the conflict of interest with Paragon Software: Of course I understand and appreciate the efforts that are going into the development of the ntfs3 driver. There must be a way to track that development, especially with regards to bugs.

Is my case of using NTFS on LVM an edge case? Perhaps, though I have no clue if the use of LVM or the device mapper kpartx has any bearing on the functionality of the ntfs3 driver. As I repeatedly mentioned, I used the ntfs-3g driver for around 12 years without an issue.

Could the large number of files I deleted be the issue? Perhaps.

I believe that at least one external drive was also affected. As explained before, these external drives don’t use LVM but were purchased pre-formatted with NTFS.

Important: The OP who started this thread does not use LVM! Unfortunately I didn’t screenshoot or copy the output of the chkdsk /F command under Windows, but I believe it mentioned “orphaned files”.

Does this answer the questions? If not I’m happy to add if and where I can.