Manjaro quality assurance (QA)

mbod · 24 September 2020 08:13

The last Manjaro testing update from today, 24-09-2020, is actually creating unbootable PCs because of wrong kernel image names.

Now I am wondering how this can happen. I thought that packages hitting the testing repos went through unstable before. But this can hardly be the case here. kernel 5.8.11 was just released yesterday and now it is already in Testing.

I know that users who use Manjaro Testing have to live with one or the other issue. But this update today is too much from my point of view. The kernel packages where obviously pushed to testing without any prior quality assurance. How can that happen? Is Testing now the new Unstable?

eugen-b · 24 September 2020 09:38

The issue was mkinitcpio package [fix]. If there were more users on Unstable who rebooted after the update and reported back, the issue would have been fixed quickly before reching Testing. Of course, the maintainer should have done that, but maybe forgot.

However, issues slip through Unstable to Testing sometimes. So Testing is not the new Ustable, it always used to have this kind of instability risk.

mbod · 24 September 2020 09:55

No reboot needed to see the issue. Just installing the new kernels has given the errors like :

==> Building image from preset: /etc/mkinitcpio.d/linux54.preset: 'default'
  -> -k /boot/vmlinuz-5.4-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-5.4-x86_64.img
==> ERROR: specified kernel image does not exist: `/boot/vmlinuz-5.4-x86_64'

Anybody on Unstable installing the new kernel should have seen that. So that is not a hidden issue.

With regard to “quality assurance” I am wondering if errors like this could be found by scripts resp. Test bots. It should be fairly easy to setup a virtual environment and automate a regular installation or update process and check for errors in the output.

sawdoctor · 24 September 2020 10:20

Surely this is the point of unstable and testing branches? We are supposed to catch the bugs before the stable release? With the best will in world its impossible for the manjaro to catch every bug there’s just so many pc variables/variants for them to check them all.

I may not agree with the manjaro team on everything but I definitely think their approach of the 3 branches is the correct way to do it rather than the arch way. The community can get most issues fixed before the official stable release

mbod · 24 September 2020 10:39

I surely agree with you. The Manjaro QA process is a positive differentiator to other distros.

But that doesn’t mean that there isn’t room for improvement. The current QA process purely depends on people. You need enough people and they need to do the right things.

Automated software testing, on the other hand, is customary in the industry already since many years. And in particular compilation and installation processes are easily automated and evaluated by robots.

This is a continuous improvement discussion. Nothing more.

philm · 24 September 2020 11:03

Manjaro-Unstable branch is a branch we publish packages without any testing. When we feel ready as developers, mostly me, we do a snap to the testing-branch. Then the community normally kicks in and tells us developers if we missed anything. The update was done at Thu Sep 24 08:42:09 CEST 2020 to the testing branch. A cleanup of outdated overlays was done Thu Sep 24 08:40:14 CEST 2020 in unstable.

Then I did my normal development work after the announcement of the testing-update. The community found an issue and solved it on their own as documented in the thread.

A proper fix was uploaded at Thu Sep 24 10:16:13 CEST 2020, which was forwarded with other fixes to testing at: Thu Sep 24 10:16:37 CEST 2020.

I updated my system at 2020-09-24T10:18:43+0200 and verified the fix. So most QA happens in Testing by our great community. And it showed that this process works.

To fix this particular issue:

update to latest mkinitcpio
reinstall your kernel packages

[phil@development unstable]$ pacman -Qo /boot/vmlinuz-5.4-x86_64
error: No package owns /boot/vmlinuz-5.4-x86_64

That file in question gets created during installation of the kernel by mkinicpio.

mbod · 24 September 2020 11:18

Lets briefly talk about your feelings.

A step in between, before you do the snap to testing, could be that you install the snap to a dedicated testing environment first. And then check that the installation runs without errors and a subsequent boot is successful. This could be automated.

sawdoctor · 24 September 2020 11:44

You can’t win @philm. If you hold off updates to make sure everything is tested thoroughly then people complain about lack of updates. If you push updates quickly and bugs get missed people complain. Who’d be a distro maintainer

mbod · 24 September 2020 11:46

@sawdoctor:

You are completely missing the point. This is about automated testing. That would help everybody.

luoe · 24 September 2020 11:51

I don’t get your point @mbod. Phil reacted very quickly and fixed the faulty package.

Mistakes can and will eventually always happen , since humans are involved. Manjaro has a ‘stable-staging’ branch, where most updates go after testing to be tested again before rolling out to stable.

You just can’t rely on any other branch than stable to get a perfect running system.

mbod · 24 September 2020 12:06

I am not talking about the support process.

This is my point: Preventing mistakes by using automated installation&boot test process.
I am surprised that the value is not obvious.

sawdoctor · 24 September 2020 12:21

I get your point and my post wasn’t aimed at you it really wasn’t, it was just a general observation. There are always posts about "why so long for updates " but when mistakes happen some people jump up and down.
The bot idea is a good idea but I have no idea how difficult it is to implement

mbod · 24 September 2020 14:35

Lets get specific. I start.

Other open source projects are using “real” tools to do automated testing. ZFS for example uses buildbot:
http://build.zfsonlinux.org/
http://buildbot.net/

This is probably overkill for what I am proposing here. I was more thinking about a “quick&dirty” solution with virtualbox, shell scripts, etc.

It could look like this

Create a virtual image of a current Manjaro Testing installation. This could be one image per main desktop environment (gnome, kde, xfce, etc.)
Make use of snapshots to be able to restore a default state before any tests.
run the virtual image headless, login with ssh, execute an update. The content of this update is most likely just a subset of the Unstable branch. Capture log information
reboot the virtual image. Capture boot log information.
Forward all log information to a self made parser script which detects relevant errors and notifies the Manjaro team.
This could be done nightly

eugen-b · 24 September 2020 15:19

Are there any people here who have experience with openQA? Or want to learn it?
I think Manjaro Team would welcome such contributors.
http://open.qa/
https://fedoraproject.org/wiki/OpenQA

freggel.doe · 24 September 2020 15:35

My (main) unstable machine didn’t have this issue when updating today but half an hour later my (secondary) testing one was affected.
That package/breakage simply wasn’t present on unstable for a long enough time to spot this error.

eugen-b · 24 September 2020 17:57

2 posts were split to a new topic: Updated laptop after a long break - now it won’t boot

eugen-b · 24 September 2020 18:10

2 posts were merged into an existing topic: Updated laptop after a long break - now it won’t boot

flibby.jibbit · 24 September 2020 18:03

I’ve done a good bit of automation over the years and will take a look at OpenQA to see if I can wrap my head around it.

airclay · 24 September 2020 18:49

Why not build out a proof of concept and then contribute to the process…

eugen-b · 25 September 2020 08:38

A post was merged into an existing topic: Welcome and introduce yourself - 2020