I have spent countless time on multithreading tar over the years. tar
isn’t multithreaded, but at least it’s usually not the limiting factor. So I only have done multithreading with compression.
The popular compression formats all have their parallel counterparts. (Which do not come with Manjaro by default.)
gz
pigz
bzip2
pbzip2
xz
pxz
But one that does…
zstd
zstd
So you can just tar cvf dest.tar.zstd /src --zstd
, and it will use multiple cores (same with extract).
With tar, for simplicity, you would always corresponding reserved letter for that compression algorithm.
z = gz
j = bzip2
J = xz
--zstd
= zstd (Sorry zstd, I guess they finally ran out of letters)
But prior to zstd, this only uses one core in each. So you would have to rework the command to something like:
tar cvf - /src | pigz -T0 -c > dest.tar.gz
The command parameters for -c = stdout/-d = decompress/-T0 = use detected cores
are all the same for all four, so this works with anything, if you swap out pigz, for zstd for example.
Bonus Content
(Possibly completely irrelevant)
I use this when I want to send a whole folder structure to another host. When I am dealing with a server where I can ONLY send files through SFTP (ssh/scp), this is a massive bottleneck. So I will compress the @#$#! out of it before it hits that 1 core ssh bottleneck.
Note: This will copy the local folder /src
and everything in it, to the remote directory of /target
(and overwrite duplicate file names).
Without compression:
tar cf - /src | ssh user@host 'dd bs=64M | (cd /target; tar xf -)'
(Replace src, user, host, target with proper values)
With parallel zstd compression (comes with Manjaro, by default):
tar cf - /src --zstd | ssh user@host 'dd bs=64M | (cd /target; tar xf - --zstd)'
Or with any type of compression you want:
tar cf - /src | pigz -c -T0 | ssh user@host 'dd bs=64M | pigz -cd -T0 | (cd /target; tar xf -)'