-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry and background run to node taint removal #1861
Conversation
Code Coverage Diff
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make verify
needs fixing:
pkg/driver/node.go:859:30: `removeTaintInBackground` - `k8sClient` is unused (unparam)
func removeTaintInBackground(k8sClient cloud.KubernetesAPIClient) {
code changes lgtm, manually testing. Will apply label after manual validation + above is resolved.
Signed-off-by: Connor Catlett <[email protected]>
5801e80
to
555b028
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solves active customer problem.
Offline discussion was had about configuration values for taintRemovalBackoff. Decided that currently this is a two-way door solution, and we may alleviate cx painpoints without introducing additional configuration option. Currently any retry/timeout is better than no retry and an infinite timeout.
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: torredil The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Is this a bug fix or adding new feature?
Bug fix
What is this PR about? / Why do we need it?
Sometimes, the node taint removal can fail (for example, due to a temporary networking outage, or if another client tries to remove a taint at the same time as us and gets to the API server first). In that case we should retry the taint removal.
This PR does two things:
What testing is done?
Manually tested on a local cluster (but please at least one reviewer double check my work), as well as CI