-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The helm-delete job is not cleaned when helm release is deleted by helmchart CR #177
Comments
Yep, we are also experiencing the same behavior. |
I'm unable to replicate this. What version of k3s or helm-controller are you on? The code here should delete any objects owned by the HelmChart: helm-controller/pkg/controllers/chart/chart.go Lines 253 to 265 in 57fde46
|
The k3s version I used is: This condition will not happen every time, if you tried replicate it in a short time slot, you may not able to find it. I also check the code and I think the following part cause that, but I am not sure what timing can cause that. helm-controller/pkg/controllers/chart/chart.go Lines 215 to 220 in 57fde46
When the condition happened, not only the delete job, but also some configmaps of helmchart is created again. |
I've run the end to end tests quite a few times and not been able to reproduce it; do you have any circumstances or specific steps that seem to contribute to it? Deleting the HelmChart too quickly after creating it, deleting the namespace before the HelmChart, or so on? |
My circumstance is deploying a Rocky Linux VM on VMware ESXi server with 2 vCPU and 8G RAM, then install k3s on the VM. I didn't do specific steps to create/delete the HelmChart, just use command "kubectl apply/delete -f <HelmChart.yaml>" |
How long did you wait between applying and deleting it? Did the install succeed? Was it still mid-install when deleted? |
The install is succeed, I can see my pod runs without problem, not in mid-install state. |
I suspect we encounter similar issue in Harvester. When create/delete the same The previous left delete job, will be picked by the next time re-installed
This bug, causes the |
Harvester addon is on top of |
@brandond From my test in Harvester, it is highly possible due to this, both
|
One Did following test, log events and jobs: First round:
Second round:
With a workaround in Harvester to forcely delete the job before trigger
|
@w13915984028 I am curious why you don't see any of the events from the helm controller itself when you delete the chart. Are you filtering these out? What events and log messages do you see? Rather than adding code to Harvester to manually clean up after helm-controller, would you mind making an attempt to fix the unwanted behavior here? It should be somewhere in the OnRemove function at https://github.com/k3s-io/helm-controller/blob/master/pkg/controllers/chart/chart.go#L201. |
@brandond We meet the same problem, and it's real weird that HelmChart delete seems triggers 2 times of delete job.
Check the helm-install job ,event logs and pod resource
Delete helmchart after pod running and check event logs and job.
From the log events and job , we find after the first delete job completed and removed ,k3s start a second delete job without removed,so there is a delete job left. 2.Second Round:
In the second round , helm-delete process is just remove the left job after first round,but the pod is still running. 3.Third Round:
Here, you can see the different solutions to the test. The helm-delete job has been cleaned up and the event logs show that after two jobs are completed, the RemoveJob is triggered. Here is helm-controller logs:
It's strange that the test with different solutions resulted in the following questions: |
Any resolution in 2024? |
I reproduced it in my environment. There is a high probability of reproducing it in my environment. Here is my environment and steps to reproduce.
Reproduce Preparation
Reproduce Step:
After applying the mainfest is successful. Will see the following:
After successfully deleting the mainfest. Will see the following:
After successfully deleting the mainfest again. Will see the following:
After successfully deleting the mainfest again. Will see the following:
|
@brandond This situation is same to my test.We are certain that we can reproduce this issue 100%. |
Finally, I guess the root cause is found:
Below log confirmed my guess, will explain in following steps.
K8s takes time to delete an object. In above, the In my local environment (Harvester cluster), I did see two times of
It can be assumed that this is a sequence issue.
In Harvester, we added a workaroud , #177 (comment), to delete the potential previous job explicitly. The original helm-controller is a light-weight & handy tool to deploy helm chart, and very rarely the helm-chart will be deleted. Or there are long interval between creating and deleting of helm-chart object, the left But when it is used to handle the case of frequent creating/removing helm-chart, some enhancements are required. |
@w13915984028 Thank you for your replay.
it is useful to me. There is another approach to solve this issue . First, disable the Helm controller option in k3s service. Then, run the Helm controller as a separate process.
The command for executing helm-controller is as follows. I download the pre-compiled binary for helm-controller and ensure that it is compatible with the K3s version.
|
@up-wei Change the system service to k8s pod/ independent process, may affect the required running time of the program, and hence jump the bug. But I suspect it can work in all cases. I will add a PR, which will compare the creation timestamp of the job and the deletion timestamp of the helm-chart to avoid picking those old jobs. |
I added some debug information and get below logs: (1) The debug code w13915984028@c0844ed We did several times of test, the helm-controller logs are: (1) enable & disable rancher-logging (backed by helmchart); normal
(2) enable & disable again; a helm-delete job was re-created and left orphaned
(3) enable & disable again, the old helm-delete job was picked, and no true deleting
|
Events, 3 times of ApplyJob and RemoveJob
The last event, it shows clearly that the old job was picked and used
The PODs are left:
enable rancher-logging:
|
The return error (with added debug) in step below is:
It meets our analysis, the https://pkg.go.dev/k8s.io/apiserver/pkg/storage#StorageError |
That matches the comment on line 267: helm-controller/pkg/controllers/chart/chart.go Lines 266 to 270 in 255f905
We know that the HelmChart resource has been deleted, but we're trying to update it anyway. We can't update it if it doesn't exist though, so the comment about calling update to "temporarily recreate the chart" doesn't make sense; thats not something you can do. This seems to have come from #158 - @aiyengar2 can you take a look and make sure we're not missing anything? |
My doubt is: The current OnRemove is as following, it doesn't know if a helm-delete job has already been created and done, it does not use a state to record & control it; instead, it simply loop This, by nature, will generate orphaned job possibly.
|
When I run
kubectl delete helmchart <name>
command, most of time I see the helm-delete- job is still existing, and it seems it is caused by duplicated helm-delete job created by helm-controllerFor example, run
kubectl delete helmchart nats
,The job and pod for helm delete are existing,
The event show the pod for /helm-delete-nats generated twice
And the existing pod logs are as following, I think it means the job does not find the helm release to delete.
The text was updated successfully, but these errors were encountered: