-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change CNI token refresh retry duration to one minute #9539
base: master
Are you sure you want to change the base?
Change CNI token refresh retry duration to one minute #9539
Conversation
Signed-off-by: Minna Färm <[email protected]>
@huutomerkki thanks for raising this PR! I want to push a little bit on the aggressive timer here - for some of our larger scale users, this is going to result in a large increase in the number of TokenRequest API calls that Calico makes. We have users running 5k node clusters (and even larger!), which will result 5k requests per 60 seconds, or 83 requests per second across the entire cluster! I think this is probably just not workable for that scale. Some thoughts (in no particular order):
I think my preferred option would be to find a way to communicate back to the token refresher that it needs to generate a new token, but I think exposing such a channel is risky and has potential for abuse if we're not careful (imagine an attacker getting access and requesting non-stop token refresh attempts. We'd need a backstop to rate limit this for sure). This trigger would likely need to be something like a restricted UDS or file-based trigger that the CNI plugin can write to when needed in response to a permissions error - WDYT? |
I agree, minute as the refresh rate is a lot for clusters of that size. In the case I have, the requested max refresh time is around 5 minutes, which would be around 10x the current refresh rate over the 24h, and I think I also should increase the rate around the expected expiry time if any of this PR would be going in. Refreshing the token after a failed request would be the ideal solution in my mind as well, but I think implementing that might be out of my league without some major support. Adding a config for this would fulfill my goal, and I think it would be something that I could implement. |
Great, lets go with this as an approach for now and I will see about a more proactive refresh token in my spare time if I can manage it :) For a config option, I suspect it would look something like this:
Then we'd also want a corresponding change to the This would be the end-user facing API that would configure the environment variable - we can talk about the specifics of what that looks like when we get there :) |
@huutomerkki would you wish to continue to work on this ? |
Description
This PR changes the dynamic timing to constant once per minute refresh rate for the CNI token, as per the discussion in slack about the possible solution for the linked issue.
I have manually verified that following the reproduction steps with these changes included in the calico-node image fixes the issues described in #9235.
Related issues/PRs
Fixes #9235
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.