Btrfs slow read/write speed on nvme

I use Linux 5.15.55-1 LTS

Nope, but i did now.

$ sudo btrfs balance start -v /

[sudo] Passwort für user: 
WARNING:

	Full balance without filters requested. This operation is very
	intense and takes potentially very long. It is recommended to
	use the balance filters to narrow down the scope of balance.
	Use 'btrfs balance start --full-balance' option to skip this
	warning. The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
Done, had to relocate 303 out of 303 chunks

It took a while, but after reboot i got this (note I haven’t made any further changes to the fstab since I changed the 1 for fsck back to 0:

Btrfs read:

$ time cat Test.tar.gz  > /dev/null

cat Test.tar.gz > /dev/null  0,04s user 12,96s system 65% cpu 19,940 total

Btrfs write, copying the data from subvolume to subvolume:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user:
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 23,34s system 90% cpu 25,734 total


So it doesn’t matter if there is clear_cache,nospace_cache & space_cache=v2 in fstab, because it is already the default?
What about discard=async & ssd? it’s there since the installation…

So in your opinion i should enable fstrim.timer and delete from fstab discard=async , ssd , clear_cache & nospace_cache , keep defaults & noatime and add compress=zstd:1 ?

That looks like deduplication does not work when copying from subvolume to other subvolume in the same filesystem.

AFAIK, Linux kernel 5.17+ or newer improved deduplication that supports this copying without additional space.

Yes, you can remove all 3 in fstab, because space_cache=v2 and ssd ( was automatically detected by btrfs) are already default :
Check findmnt --types btrfs to tell what are default options.
discard=async is similar to fstrim.

See the options in fstab:

UUID=bd9ea09d-153e-4e0c--b3e6cbcbab86f0 / btrfs subvol=/@,defaults,noatime,compress=zstd:1 0 0

Here is my last copy benchmark:

I have two Nvme SSDs, each SSD has Btrfs and Ext4.

Btrfs: Copying the game “Dota 2” directory without archive from Btrfs SSD A to Btrfs SSD B

$ time cp -r "~/Desktop/dota 2 beta" /run/media/zesko/Backup/dota2_backup       

real    0m33,970s
user    0m0,117s
sys     0m26,751s

Ext4 Copying the same game directory from Ext4 SSD A to Ext4 SSD B

$ time cp -r "/media/Steam/steamapps/common/dota 2 beta" ~/VMs/dota2_backup

real    0m39,838s
user    0m0,105s
sys     0m30,614s

Therefore you cannot trust what kdiskmark showed but it is not for real world use.
Just forget about kdiskmark

All right now it looks like this:

$ cat /etc/fstab

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a device; this may
# be used with UUID= as a more robust way to name devices that works even if
# disks are added and removed. See fstab(5).
#
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=68BD-1D13                            /boot/efi      vfat    umask=0077 0 2
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /              btrfs   subvol=/@,defaults,noatime,compress=zstd:1 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /home          btrfs   subvol=/@home,defaults,noatime,compress=zstd:1 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /var/cache     btrfs   subvol=/@cache,defaults,noatime,compress=zstd:1 0 0
UUID=af6ff93b-3809-4e38-9acd-9031213e0a5b /var/log       btrfs   subvol=/@log,defaults,noatime,compress=zstd:1 0 0
UUID=4371b365-d8bf-430a-ade2-d4fab1a203ac swap           swap    defaults,noatime 0 0
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0

After rebooting i got this on btrfs read:

$ time cat Test.tar.gz  > /dev/null

cat Test.tar.gz > /dev/null  0,07s user 12,80s system 64% cpu 19,918 total

And this on btrfs write:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 23,72s system 89% cpu 26,570 total



After updating to kernel 5.17.15-1 I got this on read:

$ time cat Test.tar.gz  > /dev/null

cat Test.tar.gz > /dev/null  0,04s user 12,91s system 65% cpu 19,857 total

Write:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,02s user 23,43s system 47% cpu 49,327 total


So it seems read-speed is more or less the same, but what is happened to the write speed? It’s eaven worse
I noticed some warnings during kernel update:

==> Starting build: 5.17.15-1-MANJARO
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [autodetect]
-> Running build hook: [modconf]
-> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: xhci_pci
-> Running build hook: [keyboard]
-> Running build hook: [keymap]
-> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
-> Running build hook: [plymouth]
-> Running build hook: [resume]
-> Running build hook: [filesystems]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: /boot/initramfs-5.17-x86_64.img
==> Image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux517.preset: 'fallback'
-> -k /boot/vmlinuz-5.17-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-5.17-x86_64-fallback.img -S autodetect
==> Starting build: 5.17.15-1-MANJARO
-> Running build hook: [base]
-> Running build hook: [udev]
-> Running build hook: [modconf]
-> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: bfa
==> WARNING: Possibly missing firmware for module: qed
==> WARNING: Possibly missing firmware for module: qla1280
==> WARNING: Possibly missing firmware for module: qla2xxx
==> WARNING: Possibly missing firmware for module: xhci_pci

Unfortunately I can’t reproduce your last test, because I don’t have two nvme’s.

Try to upgrade linux kernel 5.18 and check if the deduplication works.

Do you mean 5.17 is worse than 5.15 on write benchmark?
Bu t I do not see that 5.17 is worse.

5.17:

0,02s user 23,43s system 47% cpu 49,327 total

5.15:

0,02s user 23,72s system 89% cpu 26,570 total

Just ignore these warnings.

You don’t have to reproduce my test. I can only provide my information on what my benchmark looks in real use, unlike kdiskmark said.

Now it seems to work:

$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

[sudo] Passwort für user: 
sudo cp Test.tar.gz /opt/Test1.tar.gz  0,01s user 0,01s system 0% cpu 2,248 total


Oh I thought the last number is the total duration…

You can open bash instead zsh as default. Both show different result text of time

$ bash
$ time sudo cp Test.tar.gz /opt/Test1.tar.gz

Well so i guess updating the kernel and editing the fstab could gain some speed, so thanks for that!

It’s not exactly what I had expected, because I thought PCIe 3.0 is something about 3GB or eaven up to 3.5GB per second. And in my case it takes about 12 seconds to read a 25GB file and 23 seconds to write it. That would be round about 2,1GB/s read and 1,1GB/s write
So it still seems to me like something is wrong… But at least it’s not 0,6GB/s, what I thought it would be as I measured with kdiskmark at the very beginning. And it’s not like I have a slow system, actually there is no real problem, this wrong value from kdiskmark just made me suspicious.

EDIT: on ext4 ist seems to be 2,5GB/s read and 1,1GB/s write, so reading a file on ext4 is a bit faster than btrfs, but still less than I thought this machine could do…

A short info:

Linux Kernel 6.1 supports Btrfs async buffered writes, which improves write performance a lot.

3 Likes

BTRFS Update for Linux Kernel 6.1 in the future:

Performance:

  • outstanding FIEMAP speed improvement

    • algorithmic change how extents are enumerated leads to orders of
      magnitude speed boost (uncached and cached)
    • extent sharing check speedup (2.2x uncached, 3x cached)
    • add more cancellation points, allowing to interrupt seeking in files
      with large number of extents
    • more efficient hole and data seeking (4x uncached, 1.3x cached)
    • sample results:
      256M, 32K extents: 4s → 29ms (~150x)
      512M, 64K extents: 30s → 59ms (~550x)
      1G, 128K extents: 225s → 120ms (~1800x)
  • improved inode logging, especially for directories (on dbench workload
    throughput +25%, max latency -21%)

  • improved buffered IO, remove redundant extent state tracking, lowering
    memory consumption and avoiding rb tree traversal

  • add sysfs tunable to let qgroup temporarily skip exact accounting when
    deleting snapshot, leading to a speedup but requiring a rescan after
    that, will be used by snapper

  • support io_uring and buffered writes, until now it was just for direct
    IO, with the no-wait semantics implemented in the buffered write path
    it now works and leads to speed improvement in IOPS (2x), throughput
    (2.2x), latency (depends, 2x to 150x)

  • small performance improvements when dropping and searching for extent
    maps as well as when flushing delalloc in COW mode (throughput +5MB/s)

https://lore.kernel.org/linux-btrfs/cover.1664798047.git.dsterba@suse.com/

3 Likes

what is the meaning of real user sys

real is time between the start and the end of any process.

user is the amount of CPU time spent in user-mode code (outside the kernel) within the process.

sys is the amount of CPU time spent in the kernel within the process.

See the explanation.


Note: bash, zsh and fish shell show different text of time result.

Check time in 3 different shells:

bash:

$ time sleep 1

real    0m1,001s
user    0m0,000s
sys     0m0,001s

zsh:

$ time sleep 1                                                                                                                                                                                               
sleep 1  0.00s user 0.00s system 0% cpu 1.001 total

fish:

$ time sleep 1

________________________________________________________
Executed in    1.00 secs      fish           external
   usr time  787.00 micros  276.00 micros  511.00 micros
   sys time    0.00 micros    0.00 micros    0.00 micros

2 Likes

@Zesko thanks for all the information