In case you want to join the discussion.
How did you create that benchmark? Are you sure doing a full rebuild between tests? Is that the speed of the whole build or just the compression? What hardware are you using to test with?
What is your setting for COMPRESSXZ in /etc/makepkg.conf?
A 10 years old Intel Core 2 Quad with an old SATA-2 SSD drive.
BUILD is a script that removes any old built files before proceeding to build new ones.
- Single thread: (xz -c -z -)
- Multi thread: (xz -c -z - --threads=0)
It's an interesting topic but know that it's not related to Pamac itself because we are talking about makepkg configuration. It has to be discussed by all @manjaro-team
Changing the package compression algorithm doesn't affect build speed.
Users can alter
/etc/makepkg.conf if they want to alter default build settings.
AUR is not enabled by default so users should know what it is and they are doing before enabling it (this is an old discussion).
If anything, this could be addressed by having Pamac provide more detailed output, i.e. "Compressing package" would make it obvious which stage of the process is currently running.
Pamac gives the makepkg output in terminal view that shows a "Compressing package" step.
Great. So - as far as I'm concerned nothing needs to be done.
It could be even a topic for Arch devs.
Is /etc/makepkg.conf inherited from Arch or altered by Manjaro?
These are the values from my archlinux32 install:
COMPRESSGZ=(gzip -c -f -n) COMPRESSBZ2=(bzip2 -c -f) COMPRESSXZ=(xz -c -z -) COMPRESSLRZ=(lrzip -q) COMPRESSLZO=(lzop -q) COMPRESSZ=(compress -c -f)
I assume it is the same on x86_64 Arch.
@jonathon, I don't know if it should be up to the user. If a distro can improve sth. why not? Or is there any downside of multi-threaded compression?
I think the overwhelming majority of users will have a multi-threaded CPU. If not, will the command exit with an error or fall back to single threaded compression?
Some will still have only two threads though, they won't profit much, I guess.
A test on a two-threaded CPU would be welcome! Any volunteers?
This should be tested or researched. The main downside of multithreaded default setting is that we don't know how many threads user has available.
Using gz instead of xz would make sense. Compression step takes the majority of the build time on many big packages. However, then everyone who builds official packages would need to reconfigure it on every reinstall. If the compression could be overwritten at runtime by pamac, it would be great, because nobody builds repo packages with pamac.
So, I think that the idea has merit and could improve the user experience especially on low end hardware, but I'm not sure about the best way to execute it.
It is annoying for packages which only re-package a .deb or .rpm or an upstream binary package. There it really is the majority of build time.
But for building from source from my experience it is not the majority of time.
I admit, my perception is skewed because most of my aur packages are either repackaged debs, python/bash scripts or themes/fonts, none of which require any actual compiling.
I have tested this in the past for hours, with several different algos and settings, and also posted about it.
As a matter of compatibility, we should only use xz.
Why shouldn't we have change the number of threads by default?
Because the more threads are in use, the more memory is used.
It also depends heavily on the dictionary size use for the compression.
The default xz settings on compression is
-6 which uses a dictionary of 8 MB.
With 4 threads is uses e.g.:
With 8 threads it is already doubled e.g.:
So users with low memory system could run into problems when building larger packages.
If we do want to make it faster, it would be enough to use a smaller dictionary, e.g.
(N.B.: in the example above I used a rather large 700M test file, and normally AUR packages are much smaller. The defaults are fine IMHO and can be changed easily by the user)
Worse compression, much greater RAM use, and introducing non-reproducibility.
Given this discussion is moving to "is XZ the best compression method" then we should look at using Zstd instead. The slightly larger resulting files are outweighed by the extra features and speed.
I think this was discussed not that long ago.
However, if and when
pacman gains Zstd support it would be a case of:
COMPRESSLRZ=(lrzip -q) COMPRESSLZO=(lzop -q) COMPRESSZ=(compress -c -f) +COMPRESSZST=(zstd -19 --rsyncable --noprogress) ######################################################################### # EXTENSION DEFAULTS @@ @@ # WARNING: Do NOT modify these variables unless you know what you are # doing. #newbies -PKGEXT='.pkg.tar.xz' +PKGEXT='.pkg.tar.zst' SRCEXT='.src.tar.gz'
Now that is a change I could see working.
The upstream changes stalled. Support was due to be added but the changesets never made it into pacman.
Zstd compression is also what repo-ck is using since some time. http://www.repo-ck.com/x86_64/
If a change then to something good, I agree.
Zstd has had by far the best compression-to-speed ratio in my tests.
I see if I can find the benchmarks...
- zstd -3 (default)
3#ravenfield.tar : 736623104 -> 337593720 (2.182), 186.6 MB/s , 971.2 MB/s 17.88 s
- zstd -9
9#ravenfield.tar : 736623104 -> 320281227 (2.300), 40.5 MB/s , 954.0 MB/s 27.38 s
- zstd -19
19#ravenfield.tar : 736623104 -> 280180468 (2.629), 4.91 MB/s , 787.9 MB/s 160.40 s
- xz -2
100 % 265,5 MiB / 702,5 MiB = 0,378 9,3 MiB/s 1:15 75.52 s
- xz -6 (default)
100 % 238,4 MiB / 702,5 MiB = 0,339 3,5 MiB/s 3:19 199.02 s
Maybe -19 is a bit too much for zstd? Examples are all single thread. (Again, this file is not a good example for AUR)
Pros and cons discussion XZ vs. ZSTD for package compression from Arch Forum 2017:
Edit: But it's not about implementing it in makepkg.conf by default.
The multi-threaded settings auto-detect the number of threads.
All package managers seem to work with both xz and gz indistinctly.
If that's the case, if to use multi-thread or not could be selected based on the available system memory.
They are notoriously harsh, and even insulting. Save yourselves.
Using all the cores available by default seems like a questionable decision. The priority of compressing packages over other system tasks will depend on the use case and personal preference. It seems like the default should be to prioritize general system performance over compression speed and let the end user alter that if they prefer that trade-off.
That's right, packaging shouldn't slow down the system in any case.
But I think that how much priority a process is taking on the CPU is better handled by setting an appropriate nice level to it, not just defaulting to one core.
In my experience they are pretty reasonable people. You just need to approach them with right additude and manners, and you are more than likely to receive the help you need.