-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't check every IPAM allocation on every sync. #9654
base: master
Are you sure you want to change the base?
Don't check every IPAM allocation on every sync. #9654
Conversation
This isn't perfect - it's possible for example that on larger clusters with lots of churn, this will still end up iterating a LOT of IPAM allocations. We could probably make this even better if we tracked dirty-ness at the allocation level rather than at the node level. |
After writing this PR up, I've been thinking a bit more about this and I think I'd like to adjust the strategy here. Here are the 4 main responsibilities of this IPAM controller:
For (1), we need to scan the whole IPAM data set. But we can do this relatively infrequently. For (2), we don't necessarily need to scan the entire IPAM data set - we could just clean up the known affine blocks. Scanning the whole set has some advantages - we might catch a leak slightly sooner if it was in a borrowed IPAM block, but the full scan should catch that eventually anyway. For (3), we do need to scan the entire IPAM data set, but we can probably stop doing it so frequently. Right now we do it every batch of updates, but Prometheus is polling anyway so there's really no value in doing this more frequently than the Prometheus poll timer. We could probably change this to be every 15s (and only if something changed) without any real UX problem. For (4), deleting unused IPAM blocks that are affine to existing nodes can be (and already are) performed without needing to scan the entire IPAM data set. We track empty blocks as updates come in, so we know exactly what the set of empty blocks is on demand. We can do an empty block check every sync without much cause for concern, so no need to change this. I think I would propose:
|
@caseydavenport for (3) do we scan an in-memory cache, or do we refresh the whole list from the API server? Seems like we could certainly use a cache for that less critical work. |
@fasaxc we use an in-memory cache for all of the above! So at least we have that going 😅 |
The problem with (3) I just realized is that we rely on the full IPAM scan to determine which IPs are GC candidates / leaks, which then feeds into the metrics.. so some metrics will end up being gated by the full IPAM scan interval anyway! That's not really a problem per-se (it's still accurate and reflects our slower time to detection for IPAM GC) but it does break some assumptions in our tests. I do think this is a good reason to keep the "only scan nodes that have changed" logic though. It maintains quick updates for our metrics, while also reducing the amount of work that happens on each event. |
278f85f
to
1036335
Compare
My most recent changes keep the "immediate" sync behavior, where we trigger a sync for each block update, each node deletion update, and each pod deletion update (in order to maintain quick metrics updates). What I've done instead is back to the original approach:
|
Description
Fixes #7841
The IPAM GC controller currently iterates every IP address allocation in the cluster every time an IP address is assigned or released. This can result in excessive CPU usage on larger / more active clusters.
Instead, this PR changes the behavior such that we limit the allocations we check to:
The idea is that this set should be much smaller than every node, and will allow us to incrementally scan the cluster instead of checking the entire cluster on every syncIPAM() call.
We will have a periodic (by default every 5m) sync that will check every node, as a backstop, and it's worth noting that this change on its own means we may be relying on the periodic sync slightly more often in order to determine when IPAM allocations have leaked, as it is possible leaks can occur on nodes without the controller receiving the requisite notifications to notice it.
We might be able to enhance this further by tracking when individual allocations are dirty (and, for example, only clearing them after the grace period proves they are OK or leaked). But I think that's a larger change and this one should do the trick I hope.
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.