Can I make tar multi threaded?

Premise - The intention is unclear because it has been moved. I will explain.

In the first place, the thing that was hanging was

Open an editor and paste below script into it - save the file as ~/backup-config.sh

#! /bin/bash
#
# Script for backing up configuration and package lists
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the Affero GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
#
#    @linux-aarhus - root.nix.dk
#
# ##########################################################
# Modify as necessary
# NOTE: some files and folders contains sensitive information

# example filelist=('.bash_profile' '.bashrc' '.netrc' '.profile' '.zshrc')
filelist=('.bash_profile' '.bashrc' '.netrc' '.profile' '.zshrc')

# example folderlist=('.config' '.local' '.gnupg' '.mozilla' '.ssh' '.thunderbird')
folderlist=('.config' '.local' '.gnupg' '.mozilla' '.ssh' '.thunderbird')

# configuration file name
archive_file="dotconf.tar.gz"

# official repo package list
repo_pkg_file="repo-pkglist.txt"

# custom package list
cust_pkg_file="cust-pkglist.txt"

# Do not edit below this line - unless you know what you are doing.
# ##########################################################

SCRIPTNAME=$(basename "$0")
VERSION="0.2"
if [[ -z $1 ]]; then
    echo ":: $SCRIPTNAME v$VERSION"
    echo "==> missing argument: PATH"
    echo "Usage:"
    echo "  $SCRIPTNAME /path/to/backup"
    echo "  Path to store output"
    echo "  e.g. $SCRIPTNAME /home/$USER/backup"
    echo ""
    exit
fi

set -e

if ! [[ -d $1 ]]; then
    mkdir -p $1
fi

conf_archive="$1/$archive_file"
repo_pkg_list="$1/$repo_pkg_file"
cust_pkg_list="$1/$cust_pkg_file"

# create an archive of common hidden files and folders

if [[ -e "$conf_archive" ]]; then
    # remove archive if exist
    rm -f "$conf_archive"
fi

todo=""
for file in ${filelist[@]}; do
    if [[ -f $file ]]; then
        todo+="${file} "
    fi
done

for folder in ${folderlist[@]}; do
    if [[ -d ${folder} ]]; then
        todo+="${folder} "
    fi
done

tar -zcvf "$conf_archive" $todo

# list packages from official repo
pacman -Qqen > "$repo_pkg_list"

# list foreign packages (custom e.g. AUR)
pacman -Qqem > "$cust_pkg_list"

echo " ==> Packagelists created"
echo "   --> $repo_pkg_list"
echo "   --> $cust_pkg_list"
echo " ==> Config archive created"
echo "   --> $conf_archive"
echo " ==> To install packages from lists"
echo "   --> sudo pacman -Syu --needed - < $repo_pkg_file"
echo " ==> To restore the configuration files run"
echo "   --> tar -xzf --overwrite -C $HOME $archive_file"
echo ""

I’m not good with tar. I don’t use it at all.

How can I make it work in multi-threading?

I’m only using a single CPU CORE.

PCIe4.0 NVMe to PCIe3.0 NVMe.

time tar -zcvf "$conf_archive" $todo
real 12m18.132s

-rw-r--r-- 1 13017630861 2024-10-17 04:31 dotconf.tar.gz

The .config dir/files is really big. I have about 10 browser profiles.

Do you have any good ideas?

For example, something like this.pigz .

sudo partclone.ext4 -c -s /dev/nvme0n1p7 | pigz --processes 24 > "$TARGETDIR/nvme0n1p7_🌿🌿🌿Minjaro_$(date +%m%d_%H%M)_pigz.pcl.gz" #ManjaroCinnamon

What?

What do you mean?

… you don’t use it, you say
but then you go on to say you do:

time tar -zcvf "$conf_archive" $todo

no one knows what the contents of your variable are …
($conf_archive)

I know another man who can help you with this.

The initial of this here was two years ago.

Whatever problem the man (or woman …) :nerd_face: has got,
should warrant his own post about the issue.

TARs can do many things for ya :slight_smile:

2 Likes

I agree.

@ehhen - Your question isn’t related to the Tutorial content and would be better served in a new thread where it should attract a wider response.

Cheers.


Now, I better understand the warning; that tar buildup in the lungs can be injurious to one’s health.

it refers to a modifiable backup script:

Packagelist and Configuration backup | root.nix.dk

2 Likes

The script is not written with performance in mind - it was written to facilitate moving common configuration - not to backup your data :slight_smile:

Refactor the script to use 7z - see the manpage - look for compression method - it allows to set number of cpu threads

Backup and limitations
DO NOT USE the 7-zip format for backup purpose on Linux/Unix because :
- 7-zip does not store the owner/group of the file.

   On Linux/Unix, in order to backup directories you must use tar :
    - to backup a directory  : tar cf - directory | 7za a -si directory.tar.7z
    - to restore your backup : 7za x -so directory.tar.7z | tar xf -

   If  you want to send files and directories (not the owner of file) to others Unix/MacOS/Windows users,
   you can use the 7-zip format.

     example : 7za a directory.7z  directory

   Do not use "-r" because this flag does not do what you think.

Example (untested) created from reading the 7z -h output

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on -mmt=$(($(nproc)+1)) "$conf_archive" $todo

A simpler approach may be to pipe through zstd instead of gz as described in the man page.

man tar

https://unix.stackexchange.com/questions/608207/how-to-use-multi-threading-for-creating-and-extracting-tar-xz

2 Likes

xz looks like a good man for this:

-T threads, –threads= threads

Specify the number of worker threads to use

Currently the only threading method is to split the input into blocks and compress them independently from each other. The default block size depends on the compression level and can be overridden with the –block-size= size option.

But there is also man pigz

Pigz compresses using threads to make use of multiple processors and cores

Doctor: How long have you been having these nightmares about turning into a pig?
Patient: About 2 or 3 weeeks!

3 Likes

This is barely related to the topic, however, may be useful to someone; I used the following method sometime in the middle ages to backup my /home/$USER directories:

Backup/Restore using TAR

1. Logout.
2. Login via TTY
3. Backup /home/user directory:

tar -zcvf /mnt/backups/user_archive.tar.gz /home/user

4. Restore /home/user directory:

tar -zxvf /mnt/backups/user_archive.tar.gz -C /

Cheers.

I have spent countless time on multithreading tar over the years. tar isn’t multithreaded, but at least it’s usually not the limiting factor. So I only have done multithreading with compression.

The popular compression formats all have their parallel counterparts. (Which do not come with Manjaro by default.)

  • gz :arrow_right: pigz
  • bzip2 :arrow_right: pbzip2
  • xz :arrow_right: pxz

But one that does…

  • zstd :arrow_right: zstd

So you can just tar cvf dest.tar.zstd /src --zstd, and it will use multiple cores (same with extract).

With tar, for simplicity, you would always corresponding reserved letter for that compression algorithm.

z = gz
j = bzip2
J = xz
--zstd = zstd (Sorry zstd, I guess they finally ran out of letters)

But prior to zstd, this only uses one core in each. So you would have to rework the command to something like:

tar cvf - /src | pigz -T0 -c > dest.tar.gz

The command parameters for -c = stdout/-d = decompress/-T0 = use detected cores are all the same for all four, so this works with anything, if you swap out pigz, for zstd for example.




Bonus Content
(Possibly completely irrelevant)

I use this when I want to send a whole folder structure to another host. When I am dealing with a server where I can ONLY send files through SFTP (ssh/scp), this is a massive bottleneck. So I will compress the @#$#! out of it before it hits that 1 core ssh bottleneck.

Note: This will copy the local folder /src and everything in it, to the remote directory of /target (and overwrite duplicate file names).

Without compression:

tar cf - /src | ssh user@host 'dd bs=64M | (cd /target; tar xf -)'

(Replace src, user, host, target with proper values)

With parallel zstd compression (comes with Manjaro, by default):

tar cf - /src --zstd | ssh user@host 'dd bs=64M | (cd /target; tar xf - --zstd)'

Or with any type of compression you want:

tar cf - /src | pigz -c -T0 | ssh user@host 'dd bs=64M | pigz -cd -T0 | (cd /target; tar xf -)'
2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.