You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been struggling with porting a monitoring check from Nagios to Prometheus. What it does is raise a flag if there's a shutdown scheduled on a server. It does this through this horrendous NRPE check:
we can probably get rid of all the check_procs stuff there and assume systemd, at least that's what we're asserting it, which turns this into something like:
and in fact, I wrote a Python script that would extract a metric out of that nicely:
#!/usr/bin/python3importloggingimportshlexfromsubprocessimportCalledProcessError, PIPE, rundeftest_parse_dbus():
no_sched='(st) "" 18446744073709551615'assertparse_dbus(no_sched) == ("", 0)
sched_reboot='(st) "reboot" 1725477267406843'assertparse_dbus(sched_reboot) == ("reboot", 1725477267.406843)
sched_reboot_round='(st) "reboot" 1725477267506843'assertparse_dbus(sched_reboot_round) == ("reboot", 1725477267.506843)
# theoritical: i've seen the metric "0" with the label "suspend"# before adding this test. i couldn't reproduce by suspending my# laptop, so i'm not sure wtf happened there.sched_suspend='(st) "suspend" 0'assertparse_dbus(sched_suspend) == ("", 0)
garbage='(st) "reboot" 1725477267506843 jfdklafjds'assertparse_dbus(garbage) == ("", 0)
assertparse_dbus("(st) ...") == ("", 0)
assertparse_dbus("") == ("", 0)
defparse_dbus(output: str) ->tuple[str, float]:
logging.debug("parsing DBus output: %s", output)
try:
_, kind, timestamp_str=output.split(maxsplit=2)
exceptValueErrorasexc:
logging.warning("could not parse DBus output: %r (%s)", output, exc)
return"", 0kind=kind.replace('"', "")
try:
timestamp=int(timestamp_str) /1000000exceptValueErrorasexc:
logging.warning(
"could not parse DBus timestamp: %r (%s)",
timestamp_str,
exc,
)
return"", 0logging.debug("found kind %r, timestamp %r", kind, timestamp)
ifkindandtimestamp:
returnkind, timestampelse:
return"", 0defmain():
cmd=shlex.split(
"busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown"# noqa: E501
)
try:
proc=run(cmd, check=True, stdout=PIPE, encoding="ascii")
exceptCalledProcessErrorasexc:
logging.warning("could not call command %r: %s", shlex.join(cmd), exc)
kind, timestamp="", 0else:
kind, timestamp=parse_dbus(proc.stdout)
print("# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero")
print("# TYPE node_shutdown_scheduled_timestamp_seconds gauge")
iftimestamp:
print(
"node_shutdown_scheduled_timestamp_seconds{kind=%s} %s"% (kind, timestamp)
)
else:
print("node_shutdown_scheduled_timestamp_seconds 0")
if__name__=="__main__":
main()
the problem is there's nowhere to call this thing from: shutdown(8) doesn't have any post hooks, and i don't think systemd will fire any specific service when a shutdown is scheduled... there are some dbus signal sent around though, namely ScheduledShutdown which we can get with:
I've been struggling with porting a monitoring check from Nagios to Prometheus. What it does is raise a flag if there's a shutdown scheduled on a server. It does this through this horrendous NRPE check:
i hope you can unsee this one day.
we can probably get rid of all the
check_procs
stuff there and assume systemd, at least that's what we're asserting it, which turns this into something like:and in fact, I wrote a Python script that would extract a metric out of that nicely:
the problem is there's nowhere to call this thing from:
shutdown(8)
doesn't have any post hooks, and i don't think systemd will fire any specific service when a shutdown is scheduled... there are some dbus signal sent around though, namelyScheduledShutdown
which we can get with:... which is essentially what we're doing above.
But i figured a better place to do this would be in the node exporter itself, since it's already a daemon just sitting there.
The text was updated successfully, but these errors were encountered: