Systemd won't forget about failed condition weeks ago

Dubliner · 31 August 2022 18:01

Let me apologize in advance for asking yet another noob question.

I installed zfs (AUR) on a Raspberry Pi with pools/volumes being mounted by systemd. After an update three weeks ago my system was temporarily missing the zfs kernel module. So, the systemd services zfs-import-cache.service and zfs-import-cache.service failed as expected.

Now even after the zfs kernel modules have back in place for weeks, the systemd services zfs-import-cache.service and zfs-import-cache.service will not start when booting. As a reason the failed condition from three weeks ago is listed. However, both services will happily start manually. How can I fix this?

A little background

After rebooting the Raspberry Pi the two services do not show up as failed even though they have not been started and are marked as failed in systemctl status. See here…

[root@myraspi ~]# systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.
[root@myraspi ~]# systemctl list-units --state failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.


[root@myraspi ~]# systemctl status zfs-import-cache.service
○ zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/usr/lib/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Wed 2022-08-10 01:29:31 CEST; 3 weeks 0 days ago
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zpool(8)

Aug 10 01:29:31 myraspi systemd[1]: Import ZFS pools by cache file was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs).


[root@myraspi ~]# systemctl status zfs-mount.service
○ zfs-mount.service - Mount ZFS filesystems
     Loaded: loaded (/usr/lib/systemd/system/zfs-mount.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Wed 2022-08-10 01:29:31 CEST; 3 weeks 0 days ago
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zfs(8)

Aug 10 01:29:31 myraspi systemd[1]: Mount ZFS filesystems was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs).

As aforementioned, a manual start will succeed flawlessly and the zfs pool become available right away.

[root@myraspi ~]# systemctl start zfs-import-cache.service
[root@myraspi ~]# systemctl status zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/usr/lib/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: active (exited) since Wed 2022-08-31 17:55:10 CEST; 9s ago
       Docs: man:zpool(8)
    Process: 839 ExecStart=/usr/bin/zpool import -c /etc/zfs/zpool.cache -aN $ZPOOL_IMPORT_OPTS (code=exited, status=0/SUCCESS)
   Main PID: 839 (code=exited, status=0/SUCCESS)
        CPU: 366ms

Aug 31 17:54:56 myraspi systemd[1]: Starting Import ZFS pools by cache file...
Aug 31 17:55:10 myraspi systemd[1]: Finished Import ZFS pools by cache file.


[root@myraspi ~]# systemctl start zfs-mount.service
[root@myraspi ~]# systemctl status zfs-mount.service
● zfs-mount.service - Mount ZFS filesystems
     Loaded: loaded (/usr/lib/systemd/system/zfs-mount.service; enabled; preset: enabled)
     Active: active (exited) since Wed 2022-08-31 17:56:00 CEST; 4s ago
       Docs: man:zfs(8)
    Process: 997 ExecStart=/usr/bin/zfs mount -a (code=exited, status=0/SUCCESS)
   Main PID: 997 (code=exited, status=0/SUCCESS)
        CPU: 77ms

Aug 31 17:56:00 myraspi systemd[1]: Starting Mount ZFS filesystems...
Aug 31 17:56:00 myraspi systemd[1]: Finished Mount ZFS filesystems.

I tried systemctl reset-failed zfs-mount.service which did not make any difference. Neither did reinstalling/recompiling zfs-utils (AUR). Still the failed condition from Aug 10 comes up after rebooting the Raspberry Pi.

[root@myraspi ~]# systemctl status zfs-import-cache.service
○ zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/usr/lib/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Wed 2022-08-10 01:29:31 CEST; 3 weeks 0 days ago
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zpool(8)

Aug 10 01:29:31 myraspi systemd[1]: Import ZFS pools by cache file was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs).

How can get rid of that failed condition from August 10? If you have any hints for me, please do let me know. Not having the volumes mounted after a reboot is somewhat uncomfortable.

clmbtti · 31 August 2022 19:01

Can you check if you have a hook in /etc/mkinitcpio.conf for zfs? It should indicate HOOKS=( ... zfs). If you don’t have it, add that hook and run mkinitcpio -P afterwards.

Dubliner · 31 August 2022 20:05

Thank you for the quick response. I appreciate it.

The hook is in place:

HOOKS=(base udev plymouth autodetect modconf block filesystems keyboard fsck zfs)

Re-running mkinitcpio -P did not change anything. Also the directory /sys/module/zfs is populated now (as in fact it has been ever since that incident on Aug 10).

# ls /sys/module/zfs
coresize	 holders    notes		properties.pool  srcversion  version
features.kernel  initsize   parameters		refcnt		 taint
features.pool	 initstate  properties.dataset	sections	 uevent

So, the failed condition systemd keeps complaining about no longer exists. Thus, systemd always referring back to a clearly historic problem I had on Aug 10 is driving me out of my mind. I have no idea where this incident was recorded (within systemd?) and how to get rid of it.

All ideas are welcome! Please do keep them coming!

clmbtti · 31 August 2022 20:29

Just so to make it clear, is it still happening after a reboot now?
If it is okay now, you can remove the old logs with journalctl --vacuum-time=2d or any amount of time you prefer to keep instead of 2 days.

Dubliner · 31 August 2022 21:28

Yes, it still happens after every reboot. systemd always tells me about the failed condition on Aug 10. And it does not matter whether I manually started both services in between (which they will happily do - see output in the first post of this thread) or not.

I actually already tried vacuuming journalctl before. So, I used your command just now, no avail. After reboot both services are dead with the failed condition on Aug 10 in the status. So, I assume that information about the failed condition must rest somewhere else in systemd.

Additionally, I also tried disabling and enabling both services with a reboot in between (just for good measure). Still, the failed condition on Aug 10 comes up on the status for both of them.

I still need help, please.

clmbtti · 31 August 2022 22:18

We need to go back and check if you are missing any deamon for automatic start per archlinux wiki. Check if you have them enabled.

Dubliner · 1 September 2022 21:05

So, I re-read the wiki. Everything seems to check out. I tried disabling, rebooting and re-enabling zfs-import.target and zfs.target. Again, no changes. When the system comes up, the systemd services zfs-import-cache.service and zfs-import-cache.service fail. It’s always the same output:

# systemctl status zfs-mount.service
○ zfs-mount.service - Mount ZFS filesystems
     Loaded: loaded (/usr/lib/systemd/system/zfs-mount.service; enabled; preset>
     Active: inactive (dead)
  Condition: start condition failed at Wed 2022-08-10 01:29:31 CEST; 3 weeks 1 >
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zfs(8)

A manual systemctl start ... still works flawlessly.

I cannot figure out where the information about that failed condition three weeks ago is retained.

clmbtti · 1 September 2022 21:11

Lemme ping a moderator from the arm team : @Strit

Strit · 1 September 2022 21:16

I have no experience with ZFS or it’s dependencies/conditions. Sorry.

And I would have guessed that the information of missed conditions would have been stored in the journal. But as you have vacuumed it, I don’t know where it could be.

Zesko · 2 September 2022 06:28

I haven’t tried zfs for Raspberry, but I guess:

Try to edit /etc/mkinitcpio.conf to add zfs in MODULES(...), then mkinitcpio -P

If it does not work, I guess the issue is order of systemd services, they start in not clean order.

Try to edit zfs-import-cache.service to add sleep 5 or sleep 10

$ sudo systemctl --full edit zfs-import-cache.service

Example:

...
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/usr/bin/sleep 5
ExecStart=/usr/bin/zpool import -c /etc/zfs/zpool.cache -aN $ZPOOL_IMPORT_OPTS
...

Then reboot, if it works?

Dubliner · 3 September 2022 21:39

First off, thanks for all the assistance. I do appreciate it.

Try to edit /etc/mkinitcpio.conf to add zfs in MODULES(...), then mkinitcpio -P
If it does not work, I guess the issue is order of systemd services, they start in not clean order.

I tried again with no effect on the two services concerned. I am pretty sure loading the modules works as it should. After all, the modules are available.

# lsmod | grep zfs
zfs                  3428352  0
zunicode              331776  1 zfs
zzstd                 475136  1 zfs
zlua                  172032  1 zfs
zcommon                94208  1 zfs
znvpair               106496  2 zfs,zcommon
zavl                   20480  1 zfs
icp                   258048  1 zfs
spl                   114688  6 zfs,icp,zzstd,znvpair,zcommon,zavl

Also, the failed condition always refers to August 10 (i.e. ages ago) when it actually did fail because I had not re-installed zfs following a kernel upgrade. However, ever since then the zfs modules have been in place.

Just to clarify, let me paste this following bit from an output just a few seconds ago (i.e. Sep 3).

○ zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/etc/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Wed 2022-08-10 01:29:30 CEST; 3 weeks 3 days ago
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zpool(8)

Aug 10 01:29:30 myraspi systemd[1]: Import ZFS pools by cache file was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs).

Between August 10 and now I have rebooted that machine dozens of times always followed by manually starting zfs-import-cache.service and zfs-mount.service. That manual start has never given my any problems.

$ sudo systemctl status zfs-import-cache
● zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/etc/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: active (exited) since Sat 2022-09-03 23:16:59 CEST; 15s ago
       Docs: man:zpool(8)
    Process: 816 ExecStart=/usr/bin/zpool import -c /etc/zfs/zpool.cache -aN $ZPOOL_IMPORT_OPTS (code=exited, status=0/SUCCESS)
   Main PID: 816 (code=exited, status=0/SUCCESS)
        CPU: 275ms

Sep 03 23:16:55 myraspi systemd[1]: Starting Import ZFS pools by cache file...
Sep 03 23:16:59 myraspi systemd[1]: Finished Import ZFS pools by cache file.

Note, there is no mention of Aug 10 following the manual start. However, upon reboot the failed condition will be the one on Aug 10.

Try to edit zfs-import-cache.service to add sleep 5 or sleep 10

So, I tried that with 5 and 10 seconds. Actually, I thought maybe editing the service file might actually induce systemd to flush any (mysterious) records it might keep on that service. Unfortunately, it did not make any difference whatsoever. Neither did uninstalling and re-installing zfs-utils 2.15 from AUR.

There must be a place where systemd retains information about failed conditions even across reboots. Where could that be located? Is there a local systemd guru around whom I could possibly ping about this problem?

Zesko · 4 September 2022 08:07

I am not sure, some people said that ExecStartPre=/usr/bin/sleep 30 works.

OR

echo "zfs" > /etc/modules-load.d/zfs.conf

Try to add:

[Unit]
Requires=systemd-modules-load.service
After=systemd-modules-load.service
...

Some information is missing:

Which Linux Kernel?
Create a boot time diagram:

systemd-analyze plot >> ~/Downloads/boot_time.svg

Copy all code from the file boot_time.svg and paste it in pastebin to create a new link.
Then share the link here.

Mirdarthos · 4 September 2022 09:05

Hi @Dubliner,

I do not know if this is the reason or not, please excuse me if I’m missing something.

On my PC, so I’m presuming it would be the same on ARM, after editing any systemd unit file, you have to run:

systemctl daemon-reload

…to apply the changes.

Don’t know if it will, but I hope this helps!

Dubliner · 4 September 2022 20:44

Thank you for your ongoing help, Zesko and Mirdarthos.

Allow me to comment.

I am not sure, some people said that ExecStartPre=/usr/bin/sleep 30 works.

OR
echo "zfs" > /etc/modules-load.d/zfs.conf

I am afraid these two options are really an attempt to fix a different problem. The zfs modules load just fine. It is the two systemd services zfs-import-cache.service and zfs-import-cache.service not starting because they failed on Aug 10.

Anyway, I changed ExecStartPre=/usr/bin/sleep 30 and rebooted (for good measure I also added a systemctl daemon-reload). Again, I’m getting this message:

○ zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/etc/systemd/system/zfs-import-cache.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Wed 2022-08-10 01:29:31 CEST; 3 weeks 4 days ago
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zpool(8)

Aug 10 01:29:31 myraspi systemd[1]: Import ZFS pools by cache file was skipped because of a failed condition check (ConditionPathIsDirectory=/sys/module/zfs).

So, this is the output presented to me now (Sep 4). However, it does not say anything about a failed condition on Sep 4 (as I think it should). Apparently, systemd always skips the import of the cache because of a failed condition on Aug 10, i.e in the past, and does not even try to run the service because of that historical failure.

I am using kernel 5.15.56-1-MANJARO-ARM-RPI #1 SMP PREEMPT Fri Jul 22 13:22:49 UTC 2022 aarch64 GNU/Linux. Yet, I assume the problem to be rooted in systemd. That date Aug 10 just keeps haunting me there.

As I mentioned, I tried vacuuming the journal. Maybe I did not do that right? Is there any advice you might offer on how to make systemd reliably forget about past events including failed conditions?

P.S. Zesko, you mentioned “some people” up there. What was their problem exactly?

clmbtti · 4 September 2022 22:06

What do you get when you run this command?

systemctl status zfs-import-scan.service zfs-mount zfs-import.target zfs-zed zfs.target

Is everything loaded with the same timestamp?

Dubliner · 4 October 2022 08:50

I abandoned the entire system eventually making room for new hardware. During the re-import of the zfs volumes I noticed a different problem. One of the volumes is encrypted which made the system hang during boot-up. So, to put this discussion to an end, I have a feeling it was exactly that encrypted volume that caused all the stress in the first place. Maybe this information is helpful if anybody else is in a similar situation.

Once again, a huge thank you to everyone so eager to help. I really do appreciate it.