-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: Make actually initiating reboot asynchronous #2848
Conversation
Alternative to coreos#2845 which moved the `reboot` invocation into the client (which I think still makes sense in some cases). Previously we were starting the reboot before we're returned a success reply to the client, so it could see the daemon killed by `SIGTERM` and hence emit a spurious error. (Really in this case any 3 of the calling process, the client binary or the daemon could be killed in any order, so it's messy but this just closes one race) For cleanliness we reject starting any new transactions after the daemon has started a reboot. Closes: coreos#1348
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Leaving open for @lucab or @kelvinfan001 since they weighed in on the first iteration. |
*/ | ||
const char *child_argv[] = { "systemctl", "reboot", NULL }; | ||
g_autoptr(GError) local_error = NULL; | ||
if (!g_spawn_sync (NULL, (char**)child_argv, NULL, (GSpawnFlags)(G_SPAWN_CHILD_INHERITS_STDIN | G_SPAWN_SEARCH_PATH), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a systemd inhibitor lock that is blocking shutdown? Would the systemctl reboot
command spawned this function fail later when the reboot request is attempted, or error immediately and the daemon exits?
If the former, when used in finalize_deployment()
, would we have an unlocked staged deployment since we'd unlink()
_OSTREE_SYSROOT_RUNSTATE_STAGED_LOCKED
but not actually reboot and return a success message to the client even if we later fail to actually reboot?
So it seems like both of these situations are not ideal. (I may also just have misunderstood systemd-inhibit and these bits of code entirely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! So...Per this point I think really the only correct thing here is for e.g. zincati to not try to finalize if there are inhibitors. I can't think of a way to solve this without adding some sort of new inhibitor API to do something more read/write lock like so that one could do "grab an exclusive lock only when the last (read) locker has dropped".
That said honestly I was just playing with this locally and I assume I'm doing something wrong, but I can't get systemd-inhibit
to block reboots. I'm doing systemd-inhibit sleep 1h
in one shell, and systemctl reboot
in another instantly reboots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(OK some investigation shows I need to use systemctl reboot --check-inhibitors=yes
which...seems like a bug, the man page claims auto
will do that on a tty, but that's not what I'm seeing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the --check-inhibitors
option was made available in a recent-ish commit, and is not in FCOS yet; but even so, reboots should be inhibited successfully if called on a tty. I tried locally with systemd-inhibit sleep 1h &
, then if I do systemctl reboot
, it is indeed inhibited.
systemctl poweroff
Operation inhibited by "sleep 1h" (PID 2653 "systemd-inhibit", user root), reason is "Unknown reason".
Please retry operation after closing inhibitors and logging out other users.
Alternatively, ignore inhibitors and users with 'systemctl poweroff -i'.
However, another issue I see right now regarding systemd-inhibit
is that I haven't been able to get my inhibitor locks to inhibit rpm-ostree-driven reboots, even if I set the lock as root
. I assume this is because the rpm-ostree daemon is privileged and the systemctl reboot
called by the daemon is non-interactive, and so in e.g. systemd 246
(before systemd/systemd#18210), any inhibitors are ignored by rpm-ostree. So IIUC, systemd-inhibitors won't really work with rpm-ostree today, at least not until systemd 248
, after which we can add the --check-inhibitors=yes
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah if you follow issue links there's a huge amount of debate about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for solving the daemon/client race!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum. Honestly, it seems to me that inhibitors were pretty "desktop focused" before --check-inhibitors=yes
existed.
That said, we can actually just check the inhibitors in our process even for earlier systemd versions. That's pretty easy to do, it's just a DBus call. Now one might argue that that's racy, but it's also racy to do it in systemctl reboot
so that's fine.
OK merging since I think this will help, and the inhibitor behavior won't be regressed by it (inhibitors need more + orthogonal work). |
Looks like |
Agree, we probably want |
Alternative to #2845
which moved the
reboot
invocation into the client (which I thinkstill makes sense in some cases).
Previously we were starting the reboot before we're returned
a success reply to the client, so it could see the daemon killed
by
SIGTERM
and hence emit a spurious error.(Really in this case any 3 of the calling process, the client
binary or the daemon could be killed in any order, so it's messy
but this just closes one race)
For cleanliness we reject starting any new transactions after the daemon has
started a reboot.
Closes: #1348