-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid picking old helm-delete job #232
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -226,11 +226,31 @@ func (c *Controller) OnRemove(key string, chart *v1.HelmChart) (*v1.HelmChart, e | |||||
return nil, nil | ||||||
} | ||||||
|
||||||
if chart.DeletionTimestamp == nil { | ||||||
return nil, nil | ||||||
} | ||||||
|
||||||
expectedJob, objs, err := c.getJobAndRelatedResources(chart) | ||||||
if err != nil { | ||||||
return nil, err | ||||||
} | ||||||
|
||||||
// remove old helm-delete job if it was left | ||||||
job, err := c.jobCache.Get(chart.Namespace, expectedJob.Name) | ||||||
if err == nil && job.CreationTimestamp.Before(chart.DeletionTimestamp) { | ||||||
err = c.jobs.Delete(chart.Namespace, expectedJob.Name, &metav1.DeleteOptions{PropagationPolicy: &deletePolicy}) | ||||||
if err != nil { | ||||||
if !apierrors.IsNotFound(err) { | ||||||
return nil, fmt.Errorf("fail to delete old helm-delete job %w", err) | ||||||
} | ||||||
// if IsNotFound, continue | ||||||
} else { | ||||||
// wait old job to be removed | ||||||
c.helms.EnqueueAfter(chart.Namespace, chart.Name, 1*time.Second) | ||||||
return nil, nil | ||||||
} | ||||||
} | ||||||
|
||||||
// note: on the logic of running an apply here... | ||||||
// if the uninstall job does not exist, it will create it | ||||||
// if the job already exists and it is uninstalling, nothing will change since there's no need to patch | ||||||
|
@@ -251,7 +271,7 @@ func (c *Controller) OnRemove(key string, chart *v1.HelmChart) (*v1.HelmChart, e | |||||
time.Sleep(3 * time.Second) | ||||||
|
||||||
// once we have run the above logic, we can now check if the job is complete | ||||||
job, err := c.jobCache.Get(chart.Namespace, expectedJob.Name) | ||||||
job, err = c.jobCache.Get(chart.Namespace, expectedJob.Name) | ||||||
if apierrors.IsNotFound(err) { | ||||||
// the above apply should have created it, something is wrong. | ||||||
// if you are here, there must be a bug in the code. | ||||||
|
@@ -269,7 +289,21 @@ func (c *Controller) OnRemove(key string, chart *v1.HelmChart) (*v1.HelmChart, e | |||||
chartCopy.Status.JobName = job.Name | ||||||
newChart, err := c.helms.Update(chartCopy) | ||||||
if err != nil { | ||||||
return chart, fmt.Errorf("unable to update status of helm chart to add uninstall job name %s", chartCopy.Status.JobName) | ||||||
// if chart is gone, clean resources | ||||||
if apierrors.IsNotFound(err) || strings.Contains(err.Error(), "StorageError") { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the error you shared: I think you can use https://pkg.go.dev/k8s.io/apiserver/pkg/storage#IsInvalidObj
Suggested change
It is odd that there's not an apierror for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried, it is not like normal k8s package for client usage.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see anything that prevents you from importing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
// note: an empty apply removes all resources owned by this chart | ||||||
err = generic.ConfigureApplyForObject(c.apply, chart, &generic.GeneratingHandlerOptions{ | ||||||
AllowClusterScoped: true, | ||||||
}). | ||||||
WithOwner(chart). | ||||||
WithSetID("helm-chart-registration"). | ||||||
ApplyObjects() | ||||||
if err != nil { | ||||||
return nil, fmt.Errorf("chart is gone, but unable to remove resources tied to HelmChart %s/%s, %w", chart.Namespace, chart.Name, err) | ||||||
} | ||||||
return chart, nil | ||||||
} | ||||||
return chart, fmt.Errorf("unable to update status of helm chart to add uninstall job name %s, %w", chartCopy.Status.JobName, err) | ||||||
} | ||||||
return newChart, fmt.Errorf("waiting for delete of helm chart for %s by %s", key, job.Name) | ||||||
} | ||||||
|
@@ -285,7 +319,7 @@ func (c *Controller) OnRemove(key string, chart *v1.HelmChart) (*v1.HelmChart, e | |||||
WithSetID("helm-chart-registration"). | ||||||
ApplyObjects() | ||||||
if err != nil { | ||||||
return nil, fmt.Errorf("unable to remove resources tied to HelmChart %s/%s: %s", chart.Namespace, chart.Name, err) | ||||||
return nil, fmt.Errorf("unable to remove resources tied to HelmChart %s/%s: %w", chart.Namespace, chart.Name, err) | ||||||
} | ||||||
|
||||||
return chart, nil | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to do this here, the job patcher already deletes and re-creates the job as necessary, as part of the Apply logic:
helm-controller/pkg/controllers/chart/chart.go
Lines 167 to 173 in 255f905
helm-controller/pkg/controllers/chart/chart.go
Lines 126 to 129 in 255f905
helm-controller/pkg/controllers/chart/chart.go
Lines 237 to 239 in 255f905
Is there some other bit of logic that the patcher needs to capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the bug is due to
WithOwner(chart).
inhelm-controller/pkg/controllers/chart/chart.go
Line 242 in 255f905
First round of create & delete a HelmChart X
Second round of create & delete a HelmChart X
But
helm-controller/pkg/controllers/chart/chart.go
Line 254 in 255f905
helm-controller/pkg/controllers/chart/chart.go
Line 278 in 255f905
That's the gap in the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we manually create a same namespaced & named
helm-delete
job before deleting a HelmChart, I guess it will also shortcut the HelmChart's real deleting.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, the WithOwner stuff uses the owning object's name, namespace, and GVK to track resources across namespaces. As long as the owning HelmChart object shares those same attributes across delete/create cycles it should track it properly. If you run the controller with --debug you should get debug logs from the DesiredSet stuff showing you what it's trying to do - does that shed any light on the situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brandond
I add some debug #177 (comment) on Harvester environment, the logs /events can clearly show that, the
generic.ConfigureApplyForObject
is not able to correctly identify the old lefthelm-delete
job.From the debug, it looks essential to remove the old job.
Btw the created
helm-delete
job, has noOwner
field; the apply has a interl owner field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The orphaned job, was left per this return
And the normal running path has this call
I will test if
add this call to the above return
works.