Watchdog did not stop

My system hangs sometimes on shutdowns/reboots.

This is the error I’m getting:

sep 20 23:00:18 coomerbox3000 systemd[1]: Using hardware watchdog 'iTCO_wdt', version 0, device /dev/watchd>
sep 20 23:00:18 coomerbox3000 systemd[1]: Set hardware watchdog to 10min.
sep 20 23:00:18 coomerbox3000 kernel: watchdog: watchdog0: watchdog did not stop!

Which I find quite odd because my kernel parematers have:

nowatchdog

In them, the full kernel cmdline:

GRUB_CMDLINE_LINUX_DEFAULT="quiet nowatchdog pci=noaer cpufreq.default_governor=performance apparmor=1 security=apparmor udev.log_priority=3 acpi_osi=! acpi_osi=Linux acpi_osi=\"Windows 2009\" nogpumanager i915.enable_gvt=1 pcie_port_pm=off iommu=1 intel_iommu=on kvm.ignore_msrs=1 rd.driver.pre=vfio-pci default_hugepagesz=1G hugepagesz=1G hugepages=0 transparent_hugepage=never

And with sysctl -a | grep watchdog:

kernel.nmi_watchdog = 0
kernel.soft_watchdog = 0
kernel.watchdog = 0
kernel.watchdog_cpumask = 0-11
kernel.watchdog_thresh = 10

Since that is not sufficient to stop watchdog from interfering with system shutdowns, what would be? Is there some way maybe to just ensure that it forcefully shuts down when given the signal to do so? Or that it can only timeout for a set amount of time?

These are my notes:

blacklist the following as such:

blacklist iTCO_wdt 
blacklist iTCO_vendor_support

call file watchdog.conf and place in /etc/modprobe.d/

In Grub add nmi_watchdog=0 and nowatchdog in kernel boot parameter and UPDATE GRUB


Check: cat /proc/sys/kernel/nmi_watchdog
cat /proc/sys/kernel/watchdog

Should show 0
3 Likes

Thanks, although the nowatchdog parameter seems to be doing the job to disable nmi_watchdog (as indicated by sysctl)

I will try blacklisting iTCO_wdt and iTCO_vendor_support and see if this solves the issue.

It’ll take a couple of days to do so but once I’m sure this has done the trick I will mark your post as the solution :slight_smile:

watchdog and nmi_watchdog are two separate things with similar names.

Watchdog

The Linux kernel watchdog is used to monitor if a system is running. 
It is supposed to automatically reboot hanged systems due to unrecoverable software errors. 
The watchdog module is specific to the hardware or chip being used. 
Personal computer users don’t need watchdog as they can reset the system manually. 
However, it is useful for systems that are mission critical and need the ability to reboot 
themselves without human intervention.



nmi_watchdog

A watchdog is usually a timer like mechanism which will generate an 
interrupt at a specified time interval.

An NMI is a non maskable interrupt.

So an NMI watchdog is a watchdog which will generate a non maskable interrupt, 
i.e. the interrupt handler will get executed no matter what the CPU state is.

This is very useful in scenarios where you are getting unexplained system freeze scenarios, 
as the NMI watchdog interrupt handler will simply kill whatever process 
happens to be freezing the CPU at the moment. 
This way, your CPU gets freed up AND you get a detailed stack trace of why your 
CPU got frozen up in the first place.
1 Like

I have this problem with Arch. With Manjaro it disappeared, for whatever reason. Possibly it is due to the customized kernel.

=> Blacklist the modules in /etc/modeprobe.d/blacklist.conf (create this file if you don’t have it).

To disable watchdog timers (both software and hardware), append nowatchdog to your boot parameters:

sudo nano /etc/modprobe.d/blacklist.conf

# Disable intel mei (including mei_watchdog).
blacklist intel_pmc_bxt
blacklist iTCO_vendor_support

# Do not load the 'iTCO_wdt' watchdog module on boot.
blacklist iTCO_wdt
1 Like

i had my computer on for almost 2 days before rebooting and did not encounter the issue, usually running the system for long and rebooting would make this issue very likely to occur, I am assuming it is resolved unless I encounter it again and marked your answer as the solution.

Thanks :slight_smile:

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.