Luks makes SSD slow

I have a notebook (Schenker Vision 14) with 1 TB M.2 Samsung 980 PRO | PCIe 4.0 x4 | NVMe.
It seems that disk encryption with LUKE makes read/write very slow. I tested with fresh installations to compare it:

Manjaro Gnome with LUKE:

Manjaro Gnome without LUKE:

Is this normal speed lost? Also it seems the fan turns on more often with LUKE, makes sense because CPU has more work, but that much?

A performance hit is expected, but thats acutally still pretty reasonable fast.

Veracrypt on a Windows Host is even worse than that Github Issue

Of course, reading and writing is slower then for each block it has to encrypt/decrypt.
And when doing this benchmark, your CPU is much more in use for all these operations.

I don’t think you’ll notice the difference in your actual usage of your system, and not in a benchmark.

Take a look at the encryption manual:

 man cryptsetup

There you can see several options to change a perfomance/securty balance, which involves PC performance abilities of CPU, RAM, making read speed of very fast SSD to be not useful, as other resources slows down a PC performance. To read of write data from/to a storage, that data should be intercepted by decode and encode process, which is storage-independent.

This is about do select the bias: or stronger disk encryption or faster read/write speed from/to a storage drive.

If you wish to make experiments, please do it on your test machine first, as you just start to learn it and it could very easy brake something or to let a secret info to leak, based on which your current encrypted installation could be compromised (for example to save a header information of a LUKS device somewhere in unsafe/non-strictly-private or fully-trusted place)

I think your 1.5 performance drop is very affordable in case of any general usage of a PC. You can select a stronger algorithms and to drop performance even more but increase security, you can add a weaker ones and to rise performance but to drop security. That bias is up to you.

Try to figure out for you: does it matter that your PC boots up 20 or 30 seconds? Does that makes a sense to you?
If storage have many free space and you setup the TRIM command schedule properly, you will have several GB/s speeds of write/read. How frequently do you copy a Blu-ray files or a 8K@120fps video footage of 50-100-200 GB size? Every day or week? Then does that matter that it will be written during 1.5 minutes or 2 minutes? Also, NVMe SSD usually gets hot, I do not know about your model and cooling system, but sequential 1 queue data flow speed could decreases down to 0.7-1 GB/s.

You are lucky to have a good performance Core i7-11370H.
I have i5-8250U only, but it overheats under heavy load very fast cause passive-cooling case if I leave Turbo boots technology on. I turned the technology off, which makes all core runs at 1.6 GHz max. but with no overheat even during 10-15 minutes of uninterrupted 100% CPU usage.
I have speedy SDD too.

$ inxi -CDmzy1
Memory:
  RAM:
    total: 31.11 GiB
    used: 9.39 GiB (30.2%)
  RAM Report:
    permissions: Unable to run dmidecode. Root privileges required.

CPU:
  Info: Quad Core
    model: Intel Core i5-8250U
    bits: 64
    type: MCP
    cache:
      L2: 6 MiB
  Speed: 700 MHz
    min/max: 400/1600 MHz
    Core speeds (MHz):
      1: 700
      2: 701
      3: 701
      4: 715

Drives:
  Local Storage:
    total: 465.76 GiB
    used: 73.53 GiB (15.8%)
  ID-1: /dev/nvme0n1
    vendor: Samsung
    model: SSD 970 EVO Plus 500GB
    size: 465.76 GiB
$

By specs the SSD has

Read / Write Speeds
3,500 / 3,200 MB/s

Moreover I change defaults and setup more secure algorithm of a key and I have currently

And I work under that circumstances about a half of year. It is appropriate bias (exactly by me). Of course I want more secure or performance, but it needs more money to invest in CPU and RAM speeds.
So it is your bias what to choose.


And… I forgot what did you ask? Yes, performance drop is typically expected perhaps in any encrypted system including LUKS.

The title:

Luks makes SSD slow

CPU and RAM can’t handle so much data rates to (de/en)crypt it on the fly, your SSD has idle periods, which you realized later in

exactly

According to your tests, yes, but note that you comparing a bit different tests: 3 vs 5 rounds, but more critical 1/8 GB and 1 GB of data batch.

Setup you bias (on a test PC first) and do not forget to backup all your data before to change somewhat on your daily using PC.

2 Likes

The SSD is most likely not slowing down - but you see it as such - because the tests is made by counting a number of ‘ticks’ from you initiate the write of a given data set until the data set as been flushed to disk.

When you employ encryption the data set is run through an algorithm - in memory - then the computed data set is written to disk.

The process of computing the new data set is delaying the write - not the SSD.

2 Likes

Hi together,
thank you for all your answers, very cool!

I know that the SSD (@linux-aarhus or better writing/reading process incl cpu work) is fast compared to many other disks, but still its a lot of loss, so I wondered is there way to make it better. It seems there could be one (see below)

@alven you’re right I compared different data batches, I did it with same batches and got nearly same result.

I found something very interesting: it seems that the process of encryption/decryption is overloaded with many processing queues from good old hdd days and its possible to get rid of them (https://blog.cloudflare.com/speeding-up-linux-disk-encryption/). It even found its way into kernel and can be activated with GRUB parameters: no_read_workqueue and no_write_workqueue. According to some posts I found it can speed up the process about 30% to 40%. But… in my case, its slowing down the speed massively. Really confusing. Any idea?

The only thing I can think of is the LUKS version used to create the container.

I don’t know exactly where Calamares or Grub is on this, but I recently learned grub should have been improved as to being able to use LUKS2 containers.

Unless something has changed recently - within the past months - I believe a Calamares install defaults to use luks version 1.

The defaults for a luks container - taken from cryptsetup --help

Default compiled-in key and passphrase parameters:
	Maximum keyfile size: 8192kB, Maximum interactive passphrase length 512 (characters)
Default PBKDF for LUKS1: pbkdf2, iteration time: 2000 (ms)
Default PBKDF for LUKS2: argon2id
	Iteration time: 2000, Memory required: 1048576kB, Parallel threads: 4

Default compiled-in device cipher parameters:
	loop-AES: aes, Key 256 bits
	plain: aes-cbc-essiv:sha256, Key: 256 bits, Password hashing: ripemd160
	LUKS: aes-xts-plain64, Key: 256 bits, LUKS header hashing: sha256, RNG: /dev/urandom
	LUKS: Default keysize with XTS mode (two internal keys) will be doubled.

Because every system is different there is no one-size-fits-all answer.

If you haven’t done so yet - you should really take a look at the Arch Wiki pages on encryption. This link jumps into the section on using cryptsetup

The man page is also very informative

       benchmark <options>

              Benchmarks ciphers and KDF (key derivation function).  Without parameters, it tries to measure few common configurations.

              To benchmark other ciphers or modes, you need to specify --cipher and --key-size options or --hash for KDF test.

              NOTE: This benchmark is using memory only and is only informative.  You cannot directly predict real  storage  encryption
              speed from it.

              For  testing block ciphers, this benchmark requires kernel userspace crypto API to be available (introduced in Linux ker‐
              nel 2.6.38).  If you are configuring kernel yourself, enable "User-space interface for symmetric key  cipher  algorithms"
              in "Cryptographic API" section (CRYPTO_USER_API_SKCIPHER .config option).

              <options> can be [--cipher, --key-size, --hash].

The result of running cryptsetup benchmark on my system

➜  ~ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      2202890 iterations per second for 256-bit key
PBKDF2-sha256    2792479 iterations per second for 256-bit key
PBKDF2-sha512    1949026 iterations per second for 256-bit key
PBKDF2-ripemd160 1134822 iterations per second for 256-bit key
PBKDF2-whirlpool  858081 iterations per second for 256-bit key
argon2i       9 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      9 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b      1484,5 MiB/s      4573,7 MiB/s
    serpent-cbc        128b       124,5 MiB/s       962,7 MiB/s
    twofish-cbc        128b       284,2 MiB/s       516,4 MiB/s
        aes-cbc        256b      1144,3 MiB/s      3645,9 MiB/s
    serpent-cbc        256b       128,2 MiB/s       963,7 MiB/s
    twofish-cbc        256b       290,6 MiB/s       520,3 MiB/s
        aes-xts        256b      4394,8 MiB/s      4427,0 MiB/s
    serpent-xts        256b       829,7 MiB/s       846,3 MiB/s
    twofish-xts        256b       482,9 MiB/s       487,8 MiB/s
        aes-xts        512b      3620,4 MiB/s      3612,7 MiB/s
    serpent-xts        512b       842,3 MiB/s       847,0 MiB/s
    twofish-xts        512b       486,4 MiB/s       487,6 MiB/s

Another option would be to utiliise the disk device’s encryption engine.

Even that could have other implications as implied in this topic on StackExchange/Security

I don’t know how to do it but from I can see from a few searches it uses TPM and I recall something about TPM being linked to Secure Boot - and this is where it get hairy on Archbased systems.

You should also know that the nature of flash devices makes it difficult to ensure that data is never recoverable by forencic tools.

I read an academic paper on how to reliably erase data from flash based device - it is quite interesting.

2 Likes

TY for the info. That “my” could be anything: 10-year old 2 core Celeron, 16-core modern AMD… How useful is that w/o any mentioning of exactly that system info?

  1. CPU model, it’s modes (overclocked/not, Turbo boost technology is on/off, Hyper-Threading technology is on/off), cache sizes.
  2. RAM 's modes (how many channels, DDRx number, frequency, CL (CAS latency) value).
  3. does GPU involved? Than it’s specs.

W/o knowing it is looks like “some” system is has that performance. What “some”? a phone or a super computer is unknown.

What I got:

$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       716240 iterations per second for 256-bit key
PBKDF2-sha256     905506 iterations per second for 256-bit key
PBKDF2-sha512     648871 iterations per second for 256-bit key
PBKDF2-ripemd160  367663 iterations per second for 256-bit key
PBKDF2-whirlpool  278580 iterations per second for 256-bit key
argon2i       4 iterations, 998181 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 1003769 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       474.7 MiB/s      1395.3 MiB/s
    serpent-cbc        128b        41.3 MiB/s       304.5 MiB/s
    twofish-cbc        128b        94.5 MiB/s       167.0 MiB/s
        aes-cbc        256b       363.1 MiB/s      1134.9 MiB/s
    serpent-cbc        256b        41.4 MiB/s       304.6 MiB/s
    twofish-cbc        256b        94.5 MiB/s       167.0 MiB/s
        aes-xts        256b      1385.3 MiB/s      1382.5 MiB/s
    serpent-xts        256b       268.8 MiB/s       270.2 MiB/s
    twofish-xts        256b       157.2 MiB/s       157.4 MiB/s
        aes-xts        512b      1129.4 MiB/s      1127.0 MiB/s
    serpent-xts        512b       268.8 MiB/s       270.5 MiB/s
    twofish-xts        512b       157.1 MiB/s       157.3 MiB/s
$

The system:
-) CPU model Core i5-8250U, it’s modes are:
overclocked: no,
Turbo boost technology: off (so max core freq. is the same as base frequency, which is 1.6 GHz),
Hyper-Threading technology: off (so max simultaneously processing thread count matches physical core count which his 4 for the model),
cache sizes:

$ lscpu | grep cache
L1d cache:                       128 KiB (4 instances)
L1i cache:                       128 KiB (4 instances)
L2 cache:                        1 MiB (4 instances)
L3 cache:                        6 MiB (1 instance)
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
$
  1. RAM 's modes are:
    channels config: dual,
    DDRx: DDR4,
    frequency: 2400 MHz,
    CL (CAS latency) value: CL 14.
  2. GPU: iGPU (on CPU’s die Intel UHD 620).
$ uname -r
5.15.0-2-MANJARO
$

Also note that:

$ sudo mkinitcpio -P
==> Building image from preset: /etc/mkinitcpio.d/linux515.preset: 'default'
  -> -k /boot/vmlinuz-5.15-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-5.15-x86_64.img
==> Starting build: 5.15.0-2-MANJARO
  -> Running build hook: [base]
...
  -> Running build hook: [keymap]
  -> Running build hook: [encrypt]
==> WARNING: Possibly missing firmware for module: qat_4xxx
  -> Running build hook: [filesystems]
...

The qat_4xxx module is crypt-related, so it’s FW absence possibly introduces crypt performance degradation.

@linux-aarhus, what’s yours sys info?


PS
Am I the thread hijacker? The title is

Luks makes SSD slow


no storage IO

I posted the comments on the LUKS setup - not to hijack the thread - but to illustrate how many factors are in play when you setup encryption.

And - in my opinion - LUKS does not make the SSD slow it is the work in-between - and because any and all systems are different - there is no answer to how you should setup LUKS encryption as the time spent on encryption and decryption differs from system to system and highly depends on the hardware and chosen cipher.

$ uname -a
Linux ts 5.15.0-2-MANJARO #1 SMP PREEMPT Tue Nov 2 17:09:55 UTC 2021 x86_64 GNU/Linux
$ pacman-mirrors -G
unstable
$ lscpu | grep -e cache
L1d cache:                       256 KiB (8 instances)
L1i cache:                       256 KiB (8 instances)
L2 cache:                        2 MiB (8 instances)
L3 cache:                        16 MiB (1 instance)
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.15-x86_64 root=UUID=de68eb56-b170-4f5f-a34d-484afb646d1e rw mitigations=off intel_iommu=on quiet udev.log_priority=3
$ inxi -CGm
Memory:
  RAM: total: 62.49 GiB used: 4.32 GiB (6.9%)
  RAM Report:
  permissions: Unable to run dmidecode. Root privileges required.
CPU:
  Info: 8-Core model: Intel Core i9-9900K bits: 64 type: MT MCP cache:
  L2: 16 MiB
  Speed: 800 MHz min/max: 800/5000 MHz Core speeds (MHz): 1: 800 2: 4671
  3: 3702 4: 2662 5: 2130 6: 932 7: 800 8: 800 9: 800 10: 917 11: 800
  12: 800 13: 800 14: 800 15: 800 16: 800
Graphics:
  Device-1: Intel CoffeeLake-S GT2 [UHD Graphics 630] driver: i915 v: kernel
  Device-2: NVIDIA GP106GL [Quadro P2000] driver: nvidia v: 495.44
  Display: x11 server: X.Org 1.20.13 driver: loaded: nvidia resolution:
  1: 1920x1080~60Hz 2: 1920x1080~60Hz
  Message: Unable to show advanced data. Required tool glxinfo missing.

Your benchmark is not correct. Your two pictures are not comparable. The fist one is with 3x128 MB test file size, the second one is with 5x1 GB test file size.

And the size should be equal your RAM size to prevent caching effects. Only then you have the real read/write speed of the device. If you have 16 GB of RAM you could go for 1x16 GB or 4x4 GB for example.

1 Like

Hi,
thank you very much for all your replies! I know that the benchmark was not correct, I did it again with the suggested correct way, result was quite the same.
I think I found a way that should help, but in my case it ends up in another problem, I will open another thread