-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a max failed CNRs threshold to Nodegroups #88
Add a max failed CNRs threshold to Nodegroups #88
Conversation
…to vportella/add-max-failed-cnrs-threshold
…to vportella/add-max-failed-cnrs-threshold
@@ -30,6 +44,10 @@ type Transitioner struct { | |||
|
|||
CloudProviderInstances []*mock.Node | |||
KubeNodes []*mock.Node | |||
|
|||
extrakubeObjects []client.Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extraKubeObjects
}, | ||
} | ||
|
||
// Failed CNR for the same nodegroup in a different namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are CNRs intended to be namespaced? I can't see much value in separating them by namespace like this but being able to act on the same set of backing nodes provided by a cloud provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would go back to when they were designed. I'm not particularly sure, though changing that is out of scope here.
@@ -536,3 +536,38 @@ func (t *CycleNodeRequestTransitioner) validateInstanceState(validNodeGroupInsta | |||
|
|||
return false, nil | |||
} | |||
|
|||
// deleteFailedSiblingCNRs finds the CNRs generated for the same nodegroup as | |||
// the one in the calling transitioner. It filters for deleted CNRs in the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's not a calling transitioner, it's the parent struct technically. I would just say "the one in the transitioner"
pkg/observer/controller.go
Outdated
} | ||
} | ||
if found { | ||
|
||
if dropNodeGroup { | ||
klog.Warningf("nodegroup %q has an in progress CNR.. skipping this nodegroup", nodeGroup.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message now doesn't match reality. dropNodeGroup is being set to true when the number of failed CNRs exceeds the threshold, and not when it has an in progress CNR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a second log line there because Failed
is defined as "in progress" in the existing implementation. I'm adding functionality to skip a certain number of them from counting as "in progress" so I will update for that.
maxFailedCycleNodeRequests
attribute to theNodeGroup
spec which defines how many failed CNRs are allowed for a nodegroup before observer will stop generating new ones. The equivalent to the current behaviour ismaxFailedCycleNodeRequests: 0
. This is not a breaking change.Successful
phase.cyclops_cycle_node_requests_by_phase
metric has been update to include a newnodegroup
attribute which is the name of the nodegroup the CNR was generated from. Manually created CNRs that don't match a nodegroup will include the attribute with an empty string.