Pacman-mirrors - parallel

Running pacman-mirrors -c all takes a long time.

Any reason this isn’t done in parallel?

Mirrors are written to a file, practically a text file. All written lines are parallel to each other in that file :stuck_out_tongue:
If you refer the way the packages are downloaded multiple at once aka parallel downloads, is possible to the fact that writing a text file can’t be done in bulk as the downloads are in a folder, the limit is to write one by one. :slight_smile:

2 Likes

Which is why - you can use a custom mirror pool by using either

  • –country Denmark,Germany,France
  • –continent
  • –interactive

There is an experimental switch to run the mirror testing async --use-async but it is not reliable.

1 Like

I guess testing all mirrors at the same time is the worst idea because it could easily saturate some people’s internet connection or at minimum increase latency which would impact the results accuracy.
You could end up with your best mirror at the end of the list because it could have got really high ping while ‘bulk testing’ all mirrors at the same time, but if it was tested alone without any other network activity it would have been accurate ping result.

I recommend selecting a few countries around you and look only for these countries’ mirrors, as noted above. Check wiki for list of countries.

1 Like

If using open(O_APPEND) then all writes will go to the end of a file…

All information could be written in separate files, and then collated into one at the end?

If I had more time, I’d write a PR, but I’m currently working on a few for git, python and vscode.

Ping packets are tiny…

Why not give the user the ability to choose the maximum number to test in parallel, with a default to something sensible, like 20?

It still has to have an order based on ranking or geolocation/country …

Probably

Well, that is valid for @guinux too. Limited time for a project that he works alone and IMHO is a tremendous job.

4 as default and maximum number might have been chosen for a reason i don’t know, or don’t fully understand. Probably requests per server are just limited to one, so in case of 4 simultaneous downloads is actually using 4 mirrors that work without hiccups, so 20 mirrors might be “hard” to find to be fully synced simultaneously when someone does the update.
Hopefully somebody else from the @Manjaro-Team can answer that.

That’s right, but does pacman-mirror just send a ping to an address or does it do some other things? I tried to find it in its code but so far I don’t have the answer. But if you actually do a simple real life test and observe the network activity, it is definitely not just ping :wink: (up to 350KB/sec download and 15-20KB/s upload).

tool download a file 170ko
And download 20x this file in parallel can in some cases no longer measure the server speed because our connection is too small and becomes a bottleneck pipe.

2 Likes

Yep so my theory is right :slight_smile: thanks for confirming.

The code is at https://gitlab.manjaro.org/

1 Like

Yes - there is a reason - as I mentioned earlier - we have tried - but it is unreliable and may cause the process to hang. @papajoke has contributed greatly to my understanding of why it does not always behave.

l learned python by coding pacman-mirrors so you will probably be able to find parts which could have been done better or another way. The async code has been contributed by another community member. I have had good help and suggestions from @papajoke.

You are correct. It does download a small file - and it is documented - just check pacman-mirrors.conf or read through the changelog at gitlab.

1 Like

Moreover, it is a command that we use very little (or we have to travel a lot) and this command is launched in background once a week by pamac and in this case speed is not a problem

1 Like

Apologies for my attitude earlier. I really love the flavour of this community and I feel like I was detracting from it with an entitled attitude.

Thanks all for your patience in explaining.

@linux-aarhus was there a reason that Arch’s reflector couldn’t have it’s server list swapped for Manjaro’s servers?

1 Like

I really don’t know. I picked up the current version of pacman-mirrors several years ago and pacman-mirrors developed with the increasing knowledge of Python 3.

While I have been coding for many years - at the time - I knew absolutely nothing of Python and it took a while to get familiar with the language.

I think reflector relies on the list of mirrors provided by archlinux. I don’t know the inner workings of reflector but I think it is like pacman-mirrors which also relies on information provided by the main repo server in terms of last sync and branch status.

The actual responsiveness of any given mirror will depend on the users location and the network congestion between the user and mirror.

This is true for any mirror no matter distribution.

My current coding project is a contract job - coding the backbone of a custom crm and document handling system (Currently at: Number of lines = 29.873, number of code files = 288)