Systemd timer crashing my system at midnight

I’m not entirely sure but there happens to be 3 systemd timers enabled on the system and I’ve been having problems with it crashing at midnight. I thought I’d disable them one at a time to figure out which might be causing it when it happens… I used stop and disable, yet somehow it’s been enabled again, maybe due to a system update? How can I avoid that from happening?

These crashes are a hard lockup, I cannot change tty, I cannot do anything with the system. My browser doesn’t recover the tabs instead recovering what it did from my previous boot, I think the system might have restored itself in some way where that info after boot was lost?(haven’t verified elsewhere beyond session data) Major pain since I can’t recall if anything lost might have been valuable :\

It might be possible that it’s related to the 2nd suspend/resume cycle bug I have with the same symptoms for kernels 4.9 and higher. Potentially a power management bug?

Is there anything useful in the logs that might indicate what causes the crash? You can search them using journalctl:

https://wiki.archlinux.org/index.php/systemd#Filtering_output

No, journal shows only activity 7 mins prior to midnight. Any other ideas?

So when you had it handled before which systemd timer was it?

I forgot to note it down, I disabled both logrotate and shadow timers tonight, no crash at midnight, will enable shadow and see if I crash or not.

Is there a way to figure out why it crashes or prevent the system enabling it again by update or whatever else caused it to be enabled after I explicity stop and disable via systemctl? The lack of a log makes it difficult :confused:

Good question, you can look in /etc/systemd/system or /etc/systemd/user to see what has been updated recently. I did look at mine and there were some updates in March but nothing that looked like a likely suspect. Did you write your own systemd unit files? Anyway it seems strange that anything would be scheduled at exactly midnight, I can not find any here on my machine. hmmmm

Try using

sudo systemctl list-timers --all

to see which timers are executed at midnight and then use

locate *.timer

to find the *.timer files and their *.service files. I find most of them are located in

/usr/lib/systemd/system/

Have a look inside the timer and service files to see what they are doing. You could try to run the commands manually to see if one of those crashes your system.

1 Like

Had some time to look into this a bit. It’s logrotate service. Seems fine when forced the first time, when done a 2nd time it causes the hard freeze. I guess the timer might fire it off and fail so try a 2nd time? All logs just have two lines for the service of it being started each time.

cat /var/lib/logrotate.status:

logrotate state -- version 2
"/var/log/wtmp" 2017-3-28-20:0:0
"/var/log/btmp" 2017-3-28-20:0:0

ls -l /var/log:

-rw-------  1 root utmp                 0 Mar 26 00:00 btmp
-rw-------  1 root utmp                20 Mar 25 17:37 btmp.1.gz
-rw-rw-r--  1 root utmp             58752 Mar 28 19:58 wtmp
-rw-rw-r--  1 root utmp             18847 Mar 10 15:41 wtmp.1.gz

You can see the dates are a bit inconsistent. Here is the /etc/logrotate.conf:

# see "man logrotate" for details
# rotate log files weekly
weekly

# keep 1 weeks worth of backlogs
rotate 1

# restrict maximum size of log files
size 12M

# create new (empty) log files after rotating old ones
create

# uncomment this if you want your log files compressed
compress

# Logs are moved into directory for rotation
# olddir /var/log/archive

# Ignore pacman saved files
tabooext + .pacorig .pacnew .pacsave

# Manjaro packages drop log rotation information into this directory
# include /etc/logrotate.d

/var/log/wtmp {
    weekly
    create 0664 root utmp
	minsize 512K
    rotate 1
}

/var/log/btmp {
    missingok
    weekly
    create 0600 root utmp
    rotate 1
}

I commented out the include to verify that it was still happening despite the extra config in /etc/logrotate.d/, which it does. Next I ran the unit via a different tty without the DE, instead of locking up after the command a bunch of dmesg output was spat out, unfortunately it freezes afterwards and I can only see what is on screen. I tried to output this over SSH to my phone withdmesg -wH` but the output gets cut out early on. It seems that it might be due to BFQ in the Manjaro kernel

This is what I got on my phone:

[Mar28 20:44] ------------[ cut here ]------------
[  +0.000578] kernel BUG at block/bfq.h:864!
[  +0.000445] invalid opcode: 0000 [#1] PREEMPT SMP
[  +0.000510] Modules linked in: ctr ccm fuse xt_CHECKSUM i
table_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp tun ipt_MA
QUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_
at nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype
xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp ll
 overlay arc4 ebtable_filter ebtables ip6table_filter ip6_t
bles iptable_filter rt73usb rt2x00usb rt2x00lib mac80211 cf
80211 input_leds led_class crc_itu_t mousedev joydev bnep s
d_hda_codec_hdmi sch_fq_codel snd_hda_codec_realtek snd_hda
codec_generic nvidia_drm(PO) nvidia_modeset(PO) uinput inte
_rapl x86_pkg_temp_thermal intel_powerclamp nvidia(PO) core
emp kvm_intel nls_iso8859_1 nls_cp437 vfat iTCO_wdt mei_wdt
fat kvm iTCO_vendor_support irqbypass crct10dif_pclmul evde
 mac_hid crc32_pclmul ghash_clmulni_intel
[  +0.017723]  mxm_wmi pcbc snd_hda_intel aesni_intel snd_h
a_codec aes_x86_64 snd_hda_core hci_uart crypto_simd snd_hw
ep glue_helper btbcm cryptd btqca btintel intel_cstate e100
e bluetooth snd_pcm intel_rapl_perf ptp snd_timer mei_me sn
 i2c_i801 pps_core soundcore mei rfkill intel_pch_thermal s
pchp video i2c_hid battery crc16 wmi fjes button intel_lpss
acpi acpi_als acpi_pad kfifo_buf intel_lpss industrialio tp
_tis tpm_tis_core tpm pci_stub vboxpci(O) vboxnetflt(O) vbo
netadp(O) vboxdrv(O) ttm drm_kms_helper drm syscopyarea sys
illrect sysimgblt fb_sys_fops sg ip_tables x_tables xfs lib
rc32c crc32c_generic hid_generic usbhid hid sd_mod crc32c_i
tel ahci libahci xhci_pci libata xhci_hcd scsi_mod usbcore
sb_common serio

I’ll try get more of the output from the desktop machine with a picture or recording if it helps. While debugging this and another kernel bug(which I think has similar output with BFQ) I disabled legacy USB which referred to it as 2.0, this has prevented me from using keyboard during boot and entering BIOS again :\ Not sure if I will reset the CMOS chip or go buy a PS/2 keyboard yet.

I did a recording, there wasn’t that much missed between the snippet I provided above and what I saw on the screen about BFQ. Changing the IO scheduler to something else like CFQ fixed logrotate service as well as the suspend/resume bug I and others have encountered on kernels greater than 4.8. So I guess this is solved?

Not sure what the systemd service for logrotate was doing that BFQ didn’t like :S It seems calling /usr/bin/logrotate /etc/logrotate.conf doesn’t seem to cause the lockup, the unit file is rather lengthy and unfamiliar for me though. I guess I should get in touch with the Manjaro team or BFQ dev about this?

I’m not entirely sure if I’ve fixed this as checking /var/log/ again the files timestamps don’t seem to change bar wtmp, wtmp.1.gz remains the same timestamp, no wtmp.2.gz generated. The unit/service status also doesn’t have any new entries in it’s log beyond the two starting lines:

● logrotate.service - Rotate log files
   Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled)
   Active: inactive (dead) since Tue 2017-03-28 23:13:25 NZDT; 5min ago
  Process: 4851 ExecStart=/usr/bin/logrotate /etc/logrotate.conf (code=exited, status=0/SUCCESS)
 Main PID: 4851 (code=exited, status=0/SUCCESS)
      CPU: 3ms

Mar 28 23:13:25 pola-pc systemd[1]: Starting Rotate log files...
Mar 28 23:13:25 pola-pc systemd[1]: Started Rotate log files.
1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.