-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node-controller: Support an annotation to hold/prioritize updates #2162
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Today the MCO arbitrarily chooses a node to update from the candidates. We want to allow admins to avoid specific nodes entirely (for as long as they want) as well as guide upgrade ordering. This replaces the defunct etcd-specific code with support for a generic annotation `machineconfiguration.openshift.io/update-order` that allows an external controller (and/or human) to do both of these. Setting it to `0` will entirely skip that node for updates. Otherwise, higher values are preferred. Closes: openshift#2059
a0a6192
to
35ffe71
Compare
glog.Warningf("Failed to parse %s %s: %v", node.Name, daemonconsts.MachineUpdateOrderingAnnotationKey, err) | ||
continue | ||
} | ||
// order 0 means "skip this node" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the simple PoC, but if we go this route I think we need to make this more observable; something like a count of "held/skipped" nodes in the pool as well, etc.
/hold As mentioned in #2059 (comment), I'm against this direction. Its potential for misuse is too high and it's not clear to me what problem this solves. As for the implementation, assigning special function to If the intention of this PR is to provide a mechanism for pausing updates for a particular node, then let's specifically tackle that. I'm in favor of defining an annotation whose presence is the signal to MCO that this node should be skipped. |
I replied here on that concern: #2059 (comment) I completely agree that OpenShift should by default be more intelligent about how we upgrade nodes, but I can't imagine we hardcode all of that logic into the node controller. An update ordering system seems like it really needs to be a separate controller with a higher level view (including of machinesets, etc.). And on UPI metal admins are just going to want full control. So I don't see how we can avoid a low-level API like this at least eventually.
That's fair, yeah we can make that separate. |
I asked colin to look at the problem of designing an API to allow explicit pause that wouldn't constrain us too much from doing more in the future. i don't have a strong opinion of how much further to go than just a 'paused/unpaused' bit. |
OK holding only is #2163 |
To be clear this PR is now the prioritize updates PR and #2163 is the hold updates pr? |
Yeah they're related but conceptually orthogonal. Since it seems we want #2163 more we can rebase this on that when it merges, or close this if we decide to take another direction. (I guess in fact a controller could implement update priority by simply adding a hold to everything it didn't want to update, would be crude but...) |
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@cgwalters: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Today the MCO arbitrarily chooses a node to update from the candidates.
We want to allow admins to avoid specific nodes entirely (for as long
as they want) as well as guide upgrade ordering.
This replaces the defunct etcd-specific code with support for a generic
annotation
machineconfiguration.openshift.io/update-order
that allowsan external controller (and/or human) to do both of these.
Setting it to
0
will entirely skip that node for updates. Otherwise,higher values are preferred.
Closes: #2059