Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry and background run to node taint removal #1861

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

ConnorJC3
Copy link
Contributor

Is this a bug fix or adding new feature?

Bug fix

What is this PR about? / Why do we need it?

Sometimes, the node taint removal can fail (for example, due to a temporary networking outage, or if another client tries to remove a taint at the same time as us and gets to the API server first). In that case we should retry the taint removal.

This PR does two things:

  • Moves the taint removal to run in a goroutine in the background so it doesn't block node startup
  • Adds retries with an exponential backoff to the taint removal

What testing is done?

Manually tested on a local cluster (but please at least one reviewer double check my work), as well as CI

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 12, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 12, 2023
Copy link

Code Coverage Diff

File Old Coverage New Coverage Delta
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go 79.8% 78.7% -1.1

Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make verify needs fixing:

pkg/driver/node.go:859:30: `removeTaintInBackground` - `k8sClient` is unused (unparam)
func removeTaintInBackground(k8sClient cloud.KubernetesAPIClient) {

code changes lgtm, manually testing. Will apply label after manual validation + above is resolved.

@torredil
Copy link
Member

/retest

pkg/driver/node.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2023
Copy link
Contributor

@AndrewSirenko AndrewSirenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solves active customer problem.

Offline discussion was had about configuration values for taintRemovalBackoff. Decided that currently this is a two-way door solution, and we may alleviate cx painpoints without introducing additional configuration option. Currently any retry/timeout is better than no retry and an infinite timeout.

@torredil
Copy link
Member

/retest

@torredil
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2023
@k8s-ci-robot k8s-ci-robot merged commit bdeb4a6 into kubernetes-sigs:master Dec 14, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants