Btrfs slow read/write speed on nvme

Add to fstab:

noatime,clear_cache,nospace_cache

and reboot.

space_cache=v2 can improve performance, but to be honest it wear out the SSD/NVME faster… At least on HDDs it is recommend in my opinion.

noatime improves latency. On modern systems atime is not needed, especially on desktops.

No idea. Maybe ask upstream: Issues · GNOME / Files · GitLab but in fact this menu is hardcoded.

Just to be shure, you mean like this?

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a device; this may
# be used with UUID= as a more robust way to name devices that works even if
# disks are added and removed. See fstab(5).
#
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=68BD-1D13                            /boot/efi      vfat    umask=0077 0 2
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /              btrfs   subvol=/@,defaults,discard=async,ssd,noatime,clear_cache,nospace_cache 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /home          btrfs   subvol=/@home,defaults,discard=async,ssd,noatime,clear_cache,nospace_cache 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /var/cache     btrfs   subvol=/@cache,defaults,discard=async,ssd,noatime,clear_cache,nospace_cache 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /var/log       btrfs   subvol=/@log,defaults,discard=async,ssd,noatime,clear_cache,nospace_cache 0 0
UUID=4371b365-d8bf-430a-ade2-d4fab1a203ac swap           swap    defaults,noatime 0 0
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0


Everything is correct, BUT disabling fsck for btrfs is a bad idea, anyway it is your system.

0 00 1

Great thank you!

Now i got this on btrfs:

time cat 1.zip  > /dev/null
cat 1.zip > /dev/null  0,03s user 17,28s system 89% cpu 19,325 total

I am not sure if fsck is not for Btrfs in fstab, but it is for Ext4.

In Arch wiki:

If the root file system is btrfs or XFS, the fsck order should be set to 0 instead of 1

Check cat /usr/bin/fsck.btrfs

# fsck.btrfs is a type of utility that should exist for any filesystem and is
# called during system setup when the corresponding /etc/fstab entries contain
# non-zero value for fs_passno. (See fstab(5) for more.)
#
# Traditional filesystems need to run their respective fsck utility in case the
# filesystem was not unmounted cleanly and the log needs to be replayed before
# mount. This is not needed for BTRFS. You should set fs_passno to 0.
#
# If you wish to check the consistency of a BTRFS filesystem or repair a
# damaged filesystem, see btrfs(8) subcommand 'check'. By default the
# filesystem consistency is checked, the repair mode is enabled via --repair
# option (use with care!).

AUTO=false
while getopts ":aApy" c
do
 case $c in
 a|A|p|y)       AUTO=true;;
 esac
done
shift $(($OPTIND - 1))
eval DEV=\${$#}
if [ ! -e $DEV ]; then
 echo "$0: $DEV does not exist"
 exit 8
fi
if ! $AUTO; then
 echo "If you wish to check the consistency of a BTRFS filesystem or"
 echo "repair a damaged filesystem, see btrfs(8) subcommand 'check'."
fi
exit 0

Copying benchmark in a real world using cp.

Let check: Copying the Game data 30 GB in the same filesystem
Ext4:

$ time cp Game.7z Game1.7z 

real    0m24,201s
user    0m0,001s
sys     0m21,379s

Btrfs:

$ time cp Game.7z Game1.7z

real    0m0,011s
user    0m0,001s
sys     0m0,003s

Btrfs: copying the data from subvolume to another subvolume: (Linux Kernel 5.17+ improved deduplication)

$ time sudo cp Game.7z /opt/Game1.7z

real    0m2,141s
user    0m0,038s
sys     0m0,016s

Btrfs takes no additional space, because of deduplication ability.

Writing benchmark is difficult to test Btrfs due to deduplication.


Copy data from the Nvme SSD to Btrfs filesystem in other Nvme SSD. Both SSDs are similarly fast.

time sudo cp Game.7z /run/media/zesko/Backup/Game1.7z

real    0m19,409s
user    0m0,046s
sys     0m18,632s

1 Like

btrfs is slower than other filesystems.

  • But it depends on the circumstances.
  • btrfs has a lot of safety. This comes not without costs.

Enable compression for btrfs, and you may gain some (space and speed) :wink:

When you search for speed, btrfs may be not the best choice !

You can find good Information about Btrfs in the wiki

So i should keep 0 0 on btrfs?
0 0 is default on btrfs — on ext4 it’s 0 1


My test-installation with ext4 has only 70GB root partition, so I had to generate another test file (25.6GB)
(Thanks @megavolt, using pigz and all 16 cores it’s done in seconds)

ext4:

$ time cp Test.tar.gz Test1.tar.gz

cp -i Test.tar.gz Test1.tar.gz  0,00s user 27,48s system 99% cpu 27,760 total

Btrfs:
(unfortunately I have no seccont nvme build in)

$ time cp Test.tar.gz Test1.tar.gz

cp -i Test.tar.gz Test1.tar.gz  0,00s user 0,11s system 91% cpu 0,126 total

Btrfs: copying the data from subvolume to another subvolume:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 25,88s system 92% cpu 28,093 total


I made the changes, but could it gain some speed?

So eaven if the changes to the fstab could gain some speed, i guess it is still slower. But how much slower is it and is this just the normal slowdown? And based on my read/write tests bevore and after, did the changes to the fstab do anything? Like I said, I don’t quite understand the output. Which digit shows the actual speed or duration?

But if this is normal in the end, I guess there’s nothing I can do… I just wanted to be shure
And thanks again for your help so far

Yes

Which Linux kernel do you use?
Did you run sudo btrfs balance start -v /?

My fstab

UUID=bd9ea09d-153e-4e0c--b3e6cbcbab86f0 / btrfs subvol=/@,defaults,noatime,compress=zstd 0 0

Try to change the compress=zstd to compress=zstd:1
No need clear_cache,nospace_cache, space_cache=v2 is already default.
You do not need discard=async in fstab, just use sudo systemctl enable --now fstrim.timer

I use Linux 5.15.55-1 LTS

Nope, but i did now.

$ sudo btrfs balance start -v /

[sudo] Passwort für user: 
WARNING:

	Full balance without filters requested. This operation is very
	intense and takes potentially very long. It is recommended to
	use the balance filters to narrow down the scope of balance.
	Use 'btrfs balance start --full-balance' option to skip this
	warning. The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
Done, had to relocate 303 out of 303 chunks

It took a while, but after reboot i got this (note I haven’t made any further changes to the fstab since I changed the 1 for fsck back to 0:

Btrfs read:

$ time cat Test.tar.gz  > /dev/null

cat Test.tar.gz > /dev/null  0,04s user 12,96s system 65% cpu 19,940 total

Btrfs write, copying the data from subvolume to subvolume:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user:
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 23,34s system 90% cpu 25,734 total


So it doesn’t matter if there is clear_cache,nospace_cache & space_cache=v2 in fstab, because it is already the default?
What about discard=async & ssd? it’s there since the installation…

So in your opinion i should enable fstrim.timer and delete from fstab discard=async , ssd , clear_cache & nospace_cache , keep defaults & noatime and add compress=zstd:1 ?

That looks like deduplication does not work when copying from subvolume to other subvolume in the same filesystem.

AFAIK, Linux kernel 5.17+ or newer improved deduplication that supports this copying without additional space.

Yes, you can remove all 3 in fstab, because space_cache=v2 and ssd ( was automatically detected by btrfs) are already default :
Check findmnt --types btrfs to tell what are default options.
discard=async is similar to fstrim.

See the options in fstab:

UUID=bd9ea09d-153e-4e0c--b3e6cbcbab86f0 / btrfs subvol=/@,defaults,noatime,compress=zstd:1 0 0

Here is my last copy benchmark:

I have two Nvme SSDs, each SSD has Btrfs and Ext4.

Btrfs: Copying the game “Dota 2” directory without archive from Btrfs SSD A to Btrfs SSD B

$ time cp -r "~/Desktop/dota 2 beta" /run/media/zesko/Backup/dota2_backup       

real    0m33,970s
user    0m0,117s
sys     0m26,751s

Ext4 Copying the same game directory from Ext4 SSD A to Ext4 SSD B

$ time cp -r "/media/Steam/steamapps/common/dota 2 beta" ~/VMs/dota2_backup

real    0m39,838s
user    0m0,105s
sys     0m30,614s

Therefore you cannot trust what kdiskmark showed but it is not for real world use.
Just forget about kdiskmark

All right now it looks like this:

$ cat /etc/fstab

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a device; this may
# be used with UUID= as a more robust way to name devices that works even if
# disks are added and removed. See fstab(5).
#
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=68BD-1D13                            /boot/efi      vfat    umask=0077 0 2
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /              btrfs   subvol=/@,defaults,noatime,compress=zstd:1 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /home          btrfs   subvol=/@home,defaults,noatime,compress=zstd:1 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /var/cache     btrfs   subvol=/@cache,defaults,noatime,compress=zstd:1 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /var/log       btrfs   subvol=/@log,defaults,noatime,compress=zstd:1 0 0
UUID=4371b365-d8bf-430a-ade2-d4fab1a203ac swap           swap    defaults,noatime 0 0
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0

After rebooting i got this on btrfs read:

$ time cat Test.tar.gz  > /dev/null

cat Test.tar.gz > /dev/null  0,07s user 12,80s system 64% cpu 19,918 total

And this on btrfs write:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 23,72s system 89% cpu 26,570 total



After updating to kernel 5.17.15-1 I got this on read:

$ time cat Test.tar.gz  > /dev/null

cat Test.tar.gz > /dev/null  0,04s user 12,91s system 65% cpu 19,857 total

Write:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 23,43s system 47% cpu 49,327 total


So it seems read-speed is more or less the same, but what is happened to the write speed? It’s eaven worse
I noticed some warnings during kernel update:

==> Starting build: 5.17.15-1-MANJARO
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [autodetect]
-> Running build hook: [modconf]
-> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: xhci_pci
-> Running build hook: [keyboard]
-> Running build hook: [keymap]
-> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
-> Running build hook: [plymouth]
-> Running build hook: [resume]
-> Running build hook: [filesystems]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: /boot/initramfs-5.17-x86_64.img
==> Image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux517.preset: 'fallback'
-> -k /boot/vmlinuz-5.17-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-5.17-x86_64-fallback.img -S autodetect
==> Starting build: 5.17.15-1-MANJARO
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [modconf]
-> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: bfa
==> WARNING: Possibly missing firmware for module: qed
==> WARNING: Possibly missing firmware for module: qla1280
==> WARNING: Possibly missing firmware for module: qla2xxx
==> WARNING: Possibly missing firmware for module: xhci_pci

Unfortunately I can’t reproduce your last test, because I don’t have two nvme’s.

Try to upgrade linux kernel 5.18 and check if the deduplication works.

Do you mean 5.17 is worse than 5.15 on write benchmark?
Bu t I do not see that 5.17 is worse.

5.17:

0,02s user 23,43s system 47% cpu 49,327 total

5.15:

0,02s user 23,72s system 89% cpu 26,570 total

Just ignore these warnings.

You don’t have to reproduce my test. I can only provide my information on what my benchmark looks in real use, unlike kdiskmark said.

Now it seems to work:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,01s user 0,01s system 0% cpu 2,248 total


Oh I thought the last number is the total duration…

You can open bash instead zsh as default. Both show different result text of time

$ bash
$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

Well so i guess updating the kernel and editing the fstab could gain some speed, so thanks for that!

It’s not exactly what I had expected, because I thought PCIe 3.0 is something about 3GB or eaven up to 3.5GB per second. And in my case it takes about 12 seconds to read a 25GB file and 23 seconds to write it. That would be round about 2,1GB/s read and 1,1GB/s write
So it still seems to me like something is wrong… But at least it’s not 0,6GB/s, what I thought it would be as I measured with kdiskmark at the very beginning. And it’s not like I have a slow system, actually there is no real problem, this wrong value from kdiskmark just made me suspicious.

EDIT: on ext4 ist seems to be 2,5GB/s read and 1,1GB/s write, so reading a file on ext4 is a bit faster than btrfs, but still less than I thought this machine could do…

A short info:

Linux Kernel 6.1 supports Btrfs async buffered writes, which improves write performance a lot.

2 Likes

BTRFS Update for Linux Kernel 6.1 in the future:

Performance:

  • outstanding FIEMAP speed improvement

    • algorithmic change how extents are enumerated leads to orders of
      magnitude speed boost (uncached and cached)
    • extent sharing check speedup (2.2x uncached, 3x cached)
    • add more cancellation points, allowing to interrupt seeking in files
      with large number of extents
    • more efficient hole and data seeking (4x uncached, 1.3x cached)
    • sample results:
      256M, 32K extents: 4s → 29ms (~150x)
      512M, 64K extents: 30s → 59ms (~550x)
      1G, 128K extents: 225s → 120ms (~1800x)
  • improved inode logging, especially for directories (on dbench workload
    throughput +25%, max latency -21%)

  • improved buffered IO, remove redundant extent state tracking, lowering
    memory consumption and avoiding rb tree traversal

  • add sysfs tunable to let qgroup temporarily skip exact accounting when
    deleting snapshot, leading to a speedup but requiring a rescan after
    that, will be used by snapper

  • support io_uring and buffered writes, until now it was just for direct
    IO, with the no-wait semantics implemented in the buffered write path
    it now works and leads to speed improvement in IOPS (2x), throughput
    (2.2x), latency (depends, 2x to 150x)

  • small performance improvements when dropping and searching for extent
    maps as well as when flushing delalloc in COW mode (throughput +5MB/s)

https://lore.kernel.org/linux-btrfs/cover.1664798047.git.dsterba@suse.com/

3 Likes

what is the meaning of real user sys