Commit 02353bab authored by Citronalco's avatar Citronalco

Merge branch 'softdog' into 'master'

watchdog: Neue Rolle

See merge request !30
parents 388a37ad 624aa7d1
# watchdog
Diese Rolle installiert den Watchdog-Dämon
Der Server wird neu gestartet wenn eine der folgenden Bedingungen zutrifft:
- Die durchschnittliche Last liegt fünf Minuten lang 25 mal höher als die Anzahl der CPU-Cores
- Es ist weniger als 1 MByte freier RAM verfügbar
Wenn kein Hardware-Watchdog verfügbar ist dann wird das Softdog-Kernelmodul installiert und verwendet.
- name: restart watchdog
name: watchdog
state: restarted
# FIXME: Systemd-Watchdog? -> Nein, der pingt nur. Watchdog-Paket prüft zusätzlich RAM und Load
# Hardware: /dev/watchdog existiert
# Software: /dev/watchdog existiert nicht -> softdog-Modul instalieren, nochmal testen
- name: Prüfe ob ein Watchdog-Modul geladen ist
path: /dev/watchdog
register: stat_watchdogdev
- name: Trage Kernelmodul softdog in /etc/modules ein
path: /etc/modules
line: softdog
state: present
when: stat_watchdogdev.stat.exists == false
- name: Lade Kernelmodul softdog
name: softdog
state: present
when: stat_watchdogdev.stat.exists == false
notify: restart watchdog
- name: Installiere Paket watchdog
pkg: "watchdog"
state: present
- name: Konfiguriere Watchdog
src: watchdog.conf.j2
dest: /etc/watchdog.conf
notify: restart watchdog
# {{ ansible_managed }}
#ping =
#ping =
#interface = {{ ansible_default_ipv4.interface }}
#file = /var/log/messages
#change = 1407
# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1 = 24
#max-load-5 = 18
max-load-15 = {{ 15 * ansible_processor_vcpus }}
# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
#min-memory = 1
allocatable-memory = 256
#repair-binary = /usr/sbin/repair
#repair-timeout = 60
#test-binary =
#test-timeout = 60
# The retry-timeout and repair limit are used to handle errors in a more robust
# manner. Errors must persist for longer than retry-timeout to action a repair
# or reboot, and if repair-maximum attempts are made without the test passing a
# reboot is initiated anyway.
#retry-timeout = 60
#repair-maximum = 1
watchdog-device = /dev/watchdog
# Defaults compiled into the binary
#temperature-sensor =
#max-temperature = 90
# Defaults compiled into the binary
#admin = root
#interval = 10
#logtick = 1
#log-dir = /var/log/watchdog
# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime = yes
priority = 1
# Check if rsyslogd is still running by enabling the following line
#pidfile = /var/run/
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment