What is the current recommendation for Linux filesystem?

I tried ArchLinux with ZFS in VM for the first time.
But I noticed that rollback of ZFS snapshot is inflexible.

You can not rollback an old snapshot if other snapshots are newer than this old snapshot.
But you have to delete all newer snapshots except the selected old one for rollback,
or there is a trick that you can clone ZFS filesystem, then delete the newer snapshots, then rollback the old snapshot:
https://stackoverflow.com/questions/40659016/zfs-rollback-snapshot-but-keep-newer-snapshots

I find BTRFS rollback is more flexible than ZFS.


No wonder , if ZFS is good stable for RAID levels, but its rollback is hard (Not full tree capability).
BTRFS is not stable in RAID level 5 or 6, but its rollback is flexible like full tree capability.

1 Like

I tested myself benchmark of zfs vs. btrfs in two same VM in my same hardware (a single SSD). Both filesystems are running in Linux Kernel 5.17

Copy (Read and Write) speed test with zfs (default compression lz4) in the real world. (I can not change lz4 to zstd because it crashes grub after reboot, but lz4 would be bit faster than zstd)

  • Copy speed test of ZFS (Compression LZ4)
❯ sudo rsync -ah --progress /home/test/Desktop/backup /opt/backup_copy
sending incremental file list
backup
          4,19G 100%  456,51MB/s    0:00:08 (xfr#1, to-chk=0/1)

  • ZFS, test again:

❯ sudo rsync -ah --progress /home/test/Desktop/backup /opt/backup_copy1
sending incremental file list
backup
          4,19G 100%  653,51MB/s    0:00:05 (xfr#1, to-chk=0/1)

  • Copy speed test of BTRFS (Compression zstd)
❯ sudo  rsync -ah --progress /home/test/Desktop/backup /opt/backup_copy                                                                                                                                                        
sending incremental file list
backup
          4.19G 100%  598.44MB/s    0:00:06 (xfr#1, to-chk=0/1)
  • BTRFS, test again:
❯ sudo rsync -ah --progress /home/test/Desktop/backup /opt/backup_copy1
sending incremental file list
backup
          4.19G 100%  844.77MB/s    0:00:04 (xfr#1, to-chk=0/1)

Benchmarking with FIO:

  • ZFS
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --readwrite=randrw --rwmixread=75 --size=4G --filename=testfile

test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 process
Jobs: 1 (f=1): [m(1)][97.1%][r=110MiB/s,w=36.3MiB/s][r=28.1k,w=9304 IOPS][eta 00m:02s]
test: (groupid=0, jobs=1): err= 0: pid=194869: Sun Apr 24 12:46:52 2022
  read: IOPS=11.8k, BW=46.2MiB/s (48.4MB/s)(3070MiB/66495msec)
   bw (  KiB/s): min=35328, max=120664, per=98.37%, avg=46506.10, stdev=13001.85, samples=132
   iops        : min= 8832, max=30166, avg=11626.46, stdev=3250.46, samples=132
  write: IOPS=3950, BW=15.4MiB/s (16.2MB/s)(1026MiB/66495msec); 0 zone resets
   bw (  KiB/s): min=12064, max=38680, per=98.39%, avg=15545.64, stdev=4259.67, samples=132
   iops        : min= 3016, max= 9670, avg=3886.36, stdev=1064.90, samples=132
  cpu          : usr=1.88%, sys=36.05%, ctx=116626, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=46.2MiB/s (48.4MB/s), 46.2MiB/s-46.2MiB/s (48.4MB/s-48.4MB/s), io=3070MiB (3219MB), run=66495-66495msec
  WRITE: bw=15.4MiB/s (16.2MB/s), 15.4MiB/s-15.4MiB/s (16.2MB/s-16.2MB/s), io=1026MiB (1076MB), run=66495-66495msec
  • BTRFS
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --readwrite=randrw --rwmixread=75 --size=4G --filename=testfile

test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=139MiB/s,w=45.8MiB/s][r=35.7k,w=11.7k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=24711: Sun Apr 24 12:40:16 2022
  read: IOPS=34.7k, BW=135MiB/s (142MB/s)(3070MiB/22660msec)
   bw (  KiB/s): min=129624, max=149544, per=100.00%, avg=138802.58, stdev=4058.99, samples=45
   iops        : min=32406, max=37386, avg=34700.60, stdev=1014.75, samples=45
  write: IOPS=11.6k, BW=45.3MiB/s (47.5MB/s)(1026MiB/22660msec); 0 zone resets
   bw (  KiB/s): min=42672, max=49464, per=100.00%, avg=46384.73, stdev=1385.39, samples=45
   iops        : min=10668, max=12366, avg=11596.16, stdev=346.36, samples=45
  cpu          : usr=3.85%, sys=87.35%, ctx=3168, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=135MiB/s (142MB/s), 135MiB/s-135MiB/s (142MB/s-142MB/s), io=3070MiB (3219MB), run=22660-22660msec
  WRITE: bw=45.3MiB/s (47.5MB/s), 45.3MiB/s-45.3MiB/s (47.5MB/s-47.5MB/s), io=1026MiB (1076MB), run=22660-22660msec
2 Likes

ZFS is for integrity, reliability, records-based replications, and long-term storage. It’s not meant for raw performance. I have no idea where the idea came from that it’s “faster” and better for gaming?

It also requires more of your system, due to everything it does under the hood.

Inline compression. Inline checksums. Redundant checksum data for a single record of user data. The freaking ARC (it’s a beast, and wants to use up as much RAM as is feasible.) And much more.

:question: Don’t know? Can’t decide? Windows not involved?

  • EXT4
    – Good for home, root, data, and external partitions. It works. It’s tried-and-true.

:question: Want to exploit SSD / NVMe performance? Feeling a bit “on the edge”?

  • F2FS
    – Best kept to root and things like var or opt, maybe not for home (use EXT4 instead)

:question: Got a beast system with a powerful CPU, many cores/threads, and much RAM? Dealing with large files and multimedia more than the typical user? Windows not involved?

  • XFS
    – Good for home, root, data, and external partitions.

:question: Got a grip on filesystems? Want to leverage snapshots, rollbacks, and inline compression? Up to acquiring extra knowledge and learning? Do you understand the concept of subvolumes? Windows not involved?

  • BTRFS
    – Probably best left to root, var, opt, and so on. Maybe keep home as EXT4.

:question: Need something shared between Linux and Windows?

  • exFAT
    – Nothing fancy. Works on Windows and Linux. Fine for internal or external drives. Only if you really need OS-interoperability. Make sure to use exfatprogs (Samsung), rather than exfat-utils (FUSE).

I left out ZFS for the reason that it’s best reserved for non-desktop NAS storage; and because of the licensing issues, in which its maintenance and development is outside of mainline kernel development. This can (and has) introduced bugs and regressions with kernel hops.


EDIT: The above is a quick “back of a napkin” overview if you can’t decide. It’s not comprehensive, nor is it a “hard recommendation”. It’s really just a “I don’t want to look into this too deeply. Give me the bird’s-eye view.”

2 Likes

Hello, to know, you need to ready the posts, otherwise …

In the first post there is a video from a youtuber that made a filesystem benchmark, and my questions was based on the data he shown, he said that in his tests ZFS was the fastest one, and sometimes the EXT4. I’m not able to make the tests myself to check if the data was wrong or right, so I asked here in the forum.

ZFS is very suitable for enterprises or mainframes with databases on server when you need secure data. RAID level e.g. 5 or 6 too, FreeBSD uses ZFS by default.

I never guessed where this error comes from.
Software & system are designed to use swap.
Blocking it will make them behave a way they are not designed to behave.

~]$ free
               total       utilisé      libre     partagé tamp/cache   disponible
Mem:           15397        8501        1468         474        5426        6096
Partition d'échange:      16383           1       16382

Simple read benchmarking using “cat” in VM

Read 4GB data:

  • ZFS
time for i in {0..10}; do cat "backup4GB" > /dev/null; done

real    0m43,545s
user    0m0,108s
sys     0m14,215s
  • BTRFS
time for i in {0..10}; do cat "backup4GB" > /dev/null; done

real    0m12,102s
user    0m0,045s
sys     0m9,615s

Read 5MB data:

  • ZFS
time for i in {0..5000}; do cat "data5MB" > /dev/null; done

real    0m21,236s
user    0m2,793s
sys     0m18,673s

  • BTRFS
time for i in {0..5000}; do cat "data5MB" > "/dev/null"; done

real    0m5,608s
user    0m3,732s
sys     0m2,111s

Simple write benchmarking using “cat” in VM

  • ZFS
time for i in {0..20}; do cat backup700MB > test/"test${i}"; done

real    0m10,852s
user    0m0,000s
sys     0m9,675s

  • BTRFS
time for i in {0..20}; do cat backup700MB > "test/test${i}"; done

real    0m0,028s
user    0m0,014s
sys     0m0,008s

BTRFS is fast because of deduplication, it takes no additional space!
ZFS supports deduplication in my testing without additional space, but it is slow like disabling the deduplication.

1 Like

I didn’t watch the entire video, but one explanation for his “benchmarks” is that they do not reflect real-world usage (synthetic benchmarks), and they might not account for the ZFS ARC.

The results make me suspicious, especially when ZFS just BLASTS THROUGH the other filesystems. My intuition tells me it’s using the ARC for the benchmarks.

Thanks for the feedback, I’m trying to understand the options for the end user and as always it is confusing.

Also looking to the Zesko tests, hes doing on Virtual Machine and not using a real disk, so it might present some differences, the other point is about compression, I think the benchmark video he tested in the same condition, no compression for all systems otherwise the bases are different.

Those results look more realistic, though.


Regardless, the benchmarks you’ve seen, such as on the video you posted, are likely not taking the ARC into account. The ARC bypasses disk I/O, which explains the amazing “speeds” of the ZFS tests. However, real-world usage, in which you’re reading, writing, and modifying random data is not likely to use the ARC as efficiently, and thus ZFS will underperform compared to “simpler” filesystems, such as EXT4 and XFS.

If you want raw speed and performance? EXT4 or XFS on Linux.

If you want redundancy, self-healing, integrated checksumming, inline compression, snapshots, etc, then BTRFS and ZFS (particularly on multiple drives to leverage the self-healing and redundancy protection.)

3 Likes

Thank you gain for the feedback.

Yeah, at the end I decided to use BTRFS for the Operation System and all the rest EXT4. (Games and Personal data)

2 Likes