Can I make tar multi threaded?

I have spent countless time on multithreading tar over the years. tar isn’t multithreaded, but at least it’s usually not the limiting factor. So I only have done multithreading with compression.

The popular compression formats all have their parallel counterparts. (Which do not come with Manjaro by default.)

  • gz :arrow_right: pigz
  • bzip2 :arrow_right: pbzip2
  • xz :arrow_right: pxz

But one that does…

  • zstd :arrow_right: zstd

So you can just tar cvf dest.tar.zstd /src --zstd, and it will use multiple cores (same with extract).

With tar, for simplicity, you would always corresponding reserved letter for that compression algorithm.

z = gz
j = bzip2
J = xz
--zstd = zstd (Sorry zstd, I guess they finally ran out of letters)

But prior to zstd, this only uses one core in each. So you would have to rework the command to something like:

tar cvf - /src | pigz -T0 -c > dest.tar.gz

The command parameters for -c = stdout/-d = decompress/-T0 = use detected cores are all the same for all four, so this works with anything, if you swap out pigz, for zstd for example.




Bonus Content
(Possibly completely irrelevant)

I use this when I want to send a whole folder structure to another host. When I am dealing with a server where I can ONLY send files through SFTP (ssh/scp), this is a massive bottleneck. So I will compress the @#$#! out of it before it hits that 1 core ssh bottleneck.

Note: This will copy the local folder /src and everything in it, to the remote directory of /target (and overwrite duplicate file names).

Without compression:

tar cf - /src | ssh user@host 'dd bs=64M | (cd /target; tar xf -)'

(Replace src, user, host, target with proper values)

With parallel zstd compression (comes with Manjaro, by default):

tar cf - /src --zstd | ssh user@host 'dd bs=64M | (cd /target; tar xf - --zstd)'

Or with any type of compression you want:

tar cf - /src | pigz -c -T0 | ssh user@host 'dd bs=64M | pigz -cd -T0 | (cd /target; tar xf -)'
2 Likes