Poor encrypted ZFS performance (only with repository packages)

I am running a RAIDZ2 across 5 harddrives and noticed very poor performance and high CPU usage (~50% cpu usage) when reading (or writing) files to the encrypted portion of the vdev (only about 170MiB/s), while performing fine (~4% cpu usage) for the plain not encrypted portion (about 515MiB/s).

I investigated/tested/played around a bit and noticed that both linux516-zfs and zfs-dkms packages from the extra repository suffer from this performance degradation, while the compiled zfs-dkms from AUR manages speeds of nearly 550MB/s for me and one fifth the cpu load (below 10%).

linux516-zfs and zfs-dkms both include zfs v2.1.2, while the AUR build is zfs v2.1.3. however, judging from the changelog there was no “encryption performance fix” included: Release zfs-2.1.3 · openzfs/zfs · GitHub

It seems like the prebuild kernel module doesn’t rely on hardware acceleration (AES) and does it all in software, but even then it seems rather slow for a 6core AMD Ryzen 5600G…

Specs of the system:
CPU: AMD Ryzen 5 5600G
RAM: DDR4 32GB 3600MT/s
ZFS RAIDZ2: 3x 18TB & 2x 8TB

That’s hard to pinpoint. I’ve never had performance issues like you’re describing with any version of OpenZFS.

In regards to your results, I thought it might do with the differences in your ARC during your benchmarks, but I take it you went back and forth multiple times, and rebooted between each test?

But you rightly noted a dead giveaway that it may in fact be solely about AES/encryption/acceleration, because of this:

So what will be interesting is to see if this behavior continues with zfs version 2.1.3 once it hits the official repositories.

yeah, i will test repo’s openzfs 2.0.3 once theres an update.

ARC shouldn’t make much of a difference: I rebooted between each test, test everything twice (compressed/uncompressed) and each unique file is 40GiB in size, incompressible (generated from /dev/urandom) and wouldnt even fit in ARC (32GB of RAM total, 8GB of which is dedicated to iGPU, thus ARC is about 12GB total). Tho for repeatability, I mostly tested reads and not writes. writes did behave basically the same tho, but as I said, didn’t test that part multiple times…

Yeah that’s really weird. By all accounts it makes no sense.

If 2.1.3 from the repository (when it lands) demonstrates the same performance hit, then it may in fact be different compile options between the two packages.

But then it doesn’t explain why I have no such performance impact on 2.1.2 from the repository (I’m not using the one from the AUR).

I only ever use mirror vdevs (not RAIDZ 1 or 2).

I always use AES-GCM 256-bit for encrypted datasets, which I believe is the default if not specified, anyways.

Off topic, but how is that possible? All drives in a vdev need to be the same capacity, otherwise it will use the smallest capacity drive as the storage/parity sizing.

Ill test encryption performance later today with some file based vdevs and pool types. Maybe it only applies to certain combinations :thinking:

P.s. i started out at 5x 8TB and started migrating everything to 5x18TB. Still lacking the funds for the last two drives :sweat_smile: but yeah currently i still only have the volume of only 18TB with my raidz2 pool.

So here are some file based ZFS zpools. All files were on the same SATA SSD. Reading off that SSD directly results in about 9% CPU usage and 554MiB/s, so this should be the very best case in this test.

zpools:
mirror: 2x10GB (10GB)
raidz: 3x5GB (10GB)
raidz2: 5x5GB (15GB)
draid: 4x5GB (15GB)

zfs-dkms 2.0.3 (AUR)
mirror: 353MiB/s, 13% CPU
raidz: 364MiB/s, 13% CPU
raidz2: 318MiB/s, 13% CPU
draid: 477MiB/s, 15% CPU

zfs-dkms 2.0.2 (extra repo)
mirror: 200MiB/s, 65% CPU
raidz: 202MiB/s, 65% CPU
raidz2: 182MiB/s, 58% CPU
draid: 189MiB/s, 60% CPU

linux516-zfs 2.0.2 (extra repo):
mirror: 198MiB/s, 64% CPU
raidz: 202MiB/s, 65% CPU
raidz2: 181MiB/s, 58% CPU
draid: 190MiB/s, 60% CPU

So yeah, same bad results…

Is this even the right place to report this bug(?)?

HOLY CANOLE!

This entire time I’ve been using Linux kernel 5.15.28 (technically, 5.15.x series, since I prefer to stick with an “LTS” kernel train).

Guess what? I booted into kernel 5.16.14, re-did my tests, and lo’ and behold! I get the same high CPU usage and slow speeds! :scream:

It must be a regression with kernel 5.16.x (and possibly 5.17.x as well.)


Phrased in another way:

  • :white_check_mark: Linux kernel 5.15.28 + linux515-zfs (2.1.2-17) = faster speeds, lower CPU usage

  • :warning: Linux kernel 5.16.14 + linux516-zfs (2.1.2-19) = slower speeds, higher CPU usage


Which makes me wonder if compiling it from scratch (via zfs-dkms for the user’s specific kernel, every time) circumvents this regression?

Hello,

You actually can see that Intel’s AES instructions are missing in icp module from the 5.16 branch:

❯ uname -a
Linux manjaro 5.15.28-1-MANJARO #1 SMP PREEMPT Fri Mar 11 14:12:57 UTC 2022 x86_64 GNU/Linux
❯ cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 aesni %
❯ uname -a
Linux manjaro 5.16.14-1-MANJARO #1 SMP PREEMPT Fri Mar 11 14:12:18 UTC 2022 x86_64 GNU/Linux
❯ cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 %

It’s related to this issue and is fixed in OpenZFS version 2.1.3

2 Likes

Good catch!

Another reason to stick with the LTS kernel train, and in this case 5.15! :trophy: