Pamac fails to synchronise due unacceptable TLS certificate

philm · 20 December 2022 07:03

There are too many requests for those AUR databases. Here the last 24h of traffic you guys generate just for that:

Remember that those DBs are just 8 MB in size.

Here a traffic snapshot for the last 30 days just for AUR DB files:

Certificates are all valid:

AUR, Download and Mirrors use the same as our main homepage.

So why do I get those errors?

CDN works like this:

we have one storage server which gets every 10 Minutes a new DB file from the Arch server
we purge the files every 15 mins from all CDN nodes
CDN nodes will fetch the updated DB files from the storage server and cache them again

If you hit a node which is in progress to purge the file you hit an error page. pamac should in that case ignore the error and try to fetch that file again. I assume if you retry to fetch the file you won’t have that error. You only wonder why you have that error and complain.

TriMoon · 20 December 2022 07:10

The certificates of the domains the CDN serves might be all correct, but how about the certificate of the default error pages of the CDN itself?
Because if there are no wrong certificates at all, why the error message about the certificate then?
“Simple logical deduction dedective Holmes”

philm · 20 December 2022 07:12

We don’t maintain the error page nor that cert. If you fetch the error page in source I can ask the cdn provider.

Mirdarthos · 20 December 2022 07:12

Incomplete/in-progress synchronization.
_{Or that’s my guess anyway.}

TriMoon · 20 December 2022 07:13

That was exactly my idea also…
But unfortunatly, im not on Manjaro anylonger…
You can still ask them to check theirs tobe sure.

philm · 20 December 2022 07:33

I documented it more in the issue tracker. Failed to synchronize AUR database (#1305) · Issues · Applications / pamac · GitLab

philm · 20 December 2022 07:36

not so easy if you have to do it manually. I can check if their API supports something so we can use cron jobs from our servers and upload them that way, else it is a paid service to let them maintain the CERTs. However, that is not the problem here. More or less our strict domain rules don’t accept other CERTs from the CDN provider not matching our domain when a user hits an error page when a node is in purge mode.

mithrial · 20 December 2022 07:55

This updating seems like it’s a solved problem within CDNs. The updates should be atomic: serve the old file as long as the new one isn’t completely available.

philm · 20 December 2022 08:02

Well I only assume that the non existing file might lead to an error page. CDNs are designed for files which have an unique filename and to be cached at the nodes. However since DB files never change the names but their content we have to purge them on a regular basis. Even just a millisecond might create the error.

There is also a way to pre-fetch files from the file server if they are important. However due to the high requests the CDN explicitly told us to stop pre-fetching and let the system fetch when needed, otherwise our requests DDOS the nodes. That already happend many times with the Arch AUR server in the past due to the huge demand of Manjaro users fetching that file.

Arch is updating the file every 5 minutes, we fetch it every 10 minutes and redistribute it thru the CDN network. Even there we create a lot of traffic as already documented. So I can check if there is a solution on backend side or pamac has to jump thru hoops.

TriMoon · 20 December 2022 08:14

That’s why people use a “create a temporary file and rename over old one” approach in those cases…
(But yea true it’s a lot harder todo that on a CDN distribution from the mirror sides i guess)

philm · 20 December 2022 08:18

Wont work as you have to purge the cached file to get the updated on. The file server works as a regular storage system. There we have no issue to distribute the normal files via regular rsync. This is the regular way as we also do with normal mirrors. What ever happens within the CDN network is out of our control.

We provide the files to the storage via rsync
We keep the CERTs up-to-date
We maintain the cron jobs on our Servers to instruct the API to purge DB files and other files as needed which have the same name in the server structure

All the magic with quirks happens on the CDN side which we can trigger with API cmds.

TriMoon · 20 December 2022 08:49

Just trying to help, keep that in mind

Nice rsync should already be using the rename over old file technique
Do you use the --delay-updates option? if not try that to see if it makes a difference.
(If you on your end already do use that option, check with the CDN if they do too)

I’m just wondering what you actually mean by that:
Are you removing the DB-files before you upload a new version?
If so why??

Maybe switch to systemd-timers and stop the timer/service while your side sets up things and re-start it afterwards when you are done.
That way the job in the cron entry won’t have any chance to fire in-between your work.
Just like inhibiting the sleep on a system

Again im just trying to help and find-out where things go wrong

philm · 20 December 2022 09:43

Well, the issue is this: If the file doesn’t change its name, like AUR DBs, you have to purge them from the CDN nodes so they cache the new files. Otherwise the nodes will keep the filename as nothing changed for them. After chatting with the support a little we gained the following information:

they will never show an error page for a file that’s being purged, instead CDN will either serve the old version or fetch the new one from the origin if the purge is done. There are no more than two outcomes in this situation
the error page is only triggered when the file is not cached and the origin returns 404 when fetching the file
404 responses are returned through the CNAME used and with our own certificate we also use for manjaro.org

They are pretty confident whatever happened has nothing to do with error pages.That leave us with changes on the SSL transition, so we check any recent changes to the SNI …

al_F · 20 December 2022 10:30

Seraching the forums for “https://aur.manjaro.org/packages-meta-ext-v1.json.gz” returned several threads. I see the tag “unstable” is applied here, I am on Manjaro stable (KDE).

Error messages thrown by pamac upgrade --force-refresh are not consistent, results vary (at least) between Socket I/O timed out and Unacceptable TLS certificate.

A direct download of the package via browser is fast and fine, no issues or warnings.

Checking the SSL certificate (SSL Server Test: aur.manjaro.org (Powered by Qualys SSL Labs)) takes a long time, more than 85 sec.

Hope this info can help with finding the cause.

philm · 20 December 2022 11:14

There might be some issue with libsoup3, which got introduced with Gnome 43 and sha256 CERTs as @codesardine pointed out once. Similar to this one: PBB radio: Handle SSL certificate error with untrusted certificate (#128) · Issues · Goodvibes / Goodvibes · GitLab

TriMoon · 20 December 2022 11:52

(I have changed your bullets into numbered ones for ease of responding)

I would check what they claim in (1), because what they intend might not be the reality.
(They are human system admins like all of us )

Because maybe your server returns a proper 404 page (3), but they present the user with their own 404 page (2), because yours returned a 404 code.
In other words their code:

Tries the fetch the requested resource from their cache which fails because you purged it (1).
[Not serving the old version]
Then tries to fetch it from the source but gets a 404-code from your server (3).
Then returns their own 404 page with bad certificate. (2)
[Not proxing your 404-page]

Sure they can be “pretty confident” and try to wave away their own fault as first defensive reaction, but that happens with all problems any consumer has right?

So i would suggest to check what they claim for yourself using test files:

Upload test files and purging them on the CDN without uploading a new one.
Removing the test files on your side after being uploaded to the CDN, then purge them on the CDN again.

TriMoon · 20 December 2022 12:19

I just checked(visual view in browser) the certificate on manjaro.org and did some digging on the internet and found:

Maybe it is related to your main domain name being put in the CN instead of Manjaro GmbH & Co KG which trips the implementation that checks the certificates?
(Just thinking out loud here )

philm · 20 December 2022 12:41

No. We only sync from our server to their storage server from which the nodes fetches the files. Also the purges happen within the CDN network. We only trigger those from our end via API. Main Domain-Name shouldn’t be an issue in that case when put into CN. The certificate we used on different servers too and it worked just fine. There are some nodes @guinux found issues with when using pamac as a client. One node is in London, the other I can’t ping …

TriMoon · 20 December 2022 12:51

Ok then the hunt continues i guess

cfinnberg · 20 December 2022 22:10

There has to be something more. When I posted the message, pamac was consistently giving the error (while curl and firefox retrieved the same file without complains). I also knew 3 more persons around the globe with the same problem. I still have the “problem”, so I don’t think this is just bad luck hitting a server that is in “transition”.

Interestingly, my laptop also presents the same behavior at the same time. So probably is something connection related. I also did some test with tcpdump and with pamac and at some point I got this error from my router:
ICMP6, time exceeded in-transit for 2a02:6ea0:c500::3, length 92

That address is one of the IPv6 addresses for aur.manjaro.org. I’m not sure if this means something or is just the normal behavior for this TLS error.