-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflict on weird fields #573
Comments
Could you tell us more about how is kapp being used here? Is it via To clarify: I mentioned "clients" because |
Hi @revolunet! Is this error consistent or only happens sometimes?
Are you certain about this? Because the error
would typically mean that after kapp calculated the diff and before it started applying those changes, something updated the resource in the background and hence we get a conflict. Comparing the recalculated diff and the original diff (can be seen using |
in this case, kapp is used in a github action and apply some manifests produced with this : i dont think there is some kapp-controller and yes, many kapp could run in parallel but on different namespaces and this error happen quite often these days, maybe between 5 and 10%. so i can add |
Yeah, comparing the original diff with the recalculated diff would give us an idea of the fields that are getting updated in the background and we could then try to figure out a way to resolve it (maybe a rebase rule to not update those fields). |
So here's the full diff for the deployment that fails 2, 2 metadata:
3, 3 annotations:
4, 4 deployment.kubernetes.io/revision: "1"
5 - field.cattle.io/publicEndpoints: '[{"addresses":["51.103.10.142"],"port":443,"protocol":"HTTPS","serviceName":"xxx-develop-91uqrt:app-strapi","ingressName":"xxx-develop-91uqrt:app-strapi","hostname":"xxx","path":"/","allNodes":false}]'
6, 5 kapp.k14s.io/change-group: kube-workflow/xxx-91uqrt
7, 6 kapp.k14s.io/change-group.app-strapi: kube-workflow/app-strapi.xxx-91uqrt
8, 7 kapp.k14s.io/change-rule.restore: upsert after upserting kube-workflow/restore.env-xxx
9, 8 kapp.k14s.io/create-strategy: fallback-on-update
10, 9 kapp.k14s.io/disable-original: ""
11 - kapp.k14s.io/identity: v1;xxx-develop-91uqrt/apps/Deployment/app-strapi;apps/v1
12 - kapp.k14s.io/nonce: "1660064438212705134"
10 + kapp.k14s.io/nonce: "1660122041210559682"
13, 11 kapp.k14s.io/update-strategy: fallback-on-replace
14, 12 creationTimestamp: "2022-08-09T17:03:48Z"
15, 13 generation: 2
16, 14 labels:
...
221,219 resourceVersion: "247149463"
222,220 uid: cf981ae2-2372-4ab8-961d-ce3155975a86
223,221 spec:
224 - progressDeadlineSeconds: 600
225,222 replicas: 1
226 - revisionHistoryLimit: 10
227,223 selector:
228,224 matchLabels:
229,225 component: app-strapi
230,226 kubeworkflow/kapp: xxx
231 - strategy:
232 - rollingUpdate:
233 - maxSurge: 25%
234 - maxUnavailable: 25%
235 - type: RollingUpdate
236,227 template:
237,228 metadata:
238 - creationTimestamp: null
239,229 labels:
240,230 application: xxx
241,231 component: app-strapi
242,232 kapp.k14s.io/association: v1.9b1e71da08ebc442e6cdc77552cb740a
267,257 name: strapi-configmap
268,258 - secretRef:
269,259 name: pg-user-develop
270 - image: xxx/strapi:sha-3ab94da32cb3b479804c[796]
271 - imagePullPolicy: IfNotPresent
260 + image: xxx/strapi:sha-6ea5a193875e11b54f4bf333409d1[808]
272,261 livenessProbe:
273,262 failureThreshold: 15
274,263 httpGet:
275,264 path: /_health
276,265 port: http
277 - scheme: HTTP
278,266 initialDelaySeconds: 30
279,267 periodSeconds: 5
280 - successThreshold: 1
281,268 timeoutSeconds: 5
282,269 name: app
283,270 ports:
284,271 - containerPort: 1337
285,272 name: http
286 - protocol: TCP
287,273 readinessProbe:
288,274 failureThreshold: 15
289,275 httpGet:
290,276 path: /_health
291,277 port: http
292 - scheme: HTTP
278 + initialDelaySeconds: 0
293,279 periodSeconds: 5
294,280 successThreshold: 1
295,281 timeoutSeconds: 1
296,282 resources:
297,283 limits:
298 - cpu: "1"
284 + cpu: 1
299,285 memory: 1Gi
300,286 requests:
301 - cpu: 500m
287 + cpu: 0.5
302,288 memory: 256Mi
303,289 startupProbe:
304,290 failureThreshold: 30
305,291 httpGet:
306,292 path: /_health
307,293 port: http
308 - scheme: HTTP
309,294 periodSeconds: 5
310 - successThreshold: 1
311 - timeoutSeconds: 1
312 - terminationMessagePath: /dev/termination-log
313 - terminationMessagePolicy: File
314,295 volumeMounts:
315,296 - mountPath: /app/public/uploads
316,297 name: uploads
317 - dnsPolicy: ClusterFirst
318 - restartPolicy: Always
319 - schedulerName: default-scheduler
320 - securityContext: {}
321 - terminationGracePeriodSeconds: 30
322,298 volumes:
323,299 - emptyDir: {}
324,300 name: uploads |
And the recalculated diff is the same as what you have shared in the first comment? If so, I am seeing these 2 differences:
When kapp initially calculates the diff, it tries to remove these fields, but before it could apply the change, the fields are getting removed by something else. Can you think of anything that might be removing these fields? |
No its not the same logs, but i can see this on new fails too
Hi, mmmm maybe the cattle.io annotation comes from our rancher when the ingress is provisionned. can annotations be the cause of a conflict ? |
If an annotation is added after the initial diff, it might lead to this error. This would involve adding something like this to your manifests:
This ensures that the diff remains the same when kapp recalculates the diff before applying the changes. |
Was the value of the label being used to identify the app ( I can reproduce something similar by doing something like:
(succeeds!)
|
its possible that it changed at some point in the past yes, but now its stable |
After trying to overwrite fields displayed in the diff ( Failed to update due to resource conflict (approved diff no longer matches):
Updating resource deployment/app-strapi (apps/v1) namespace: env-xxx-5dc5hx:
API server says:
Operation cannot be fulfilled on deployments.apps "app-strapi": the object has been modified; please apply your changes to the latest version and try again (reason: Conflict):
Recalculated diff:
11, 11 - kapp.k14s.io/nonce: "1660207590418011865"
12, 11 + kapp.k14s.io/nonce: "1660209982534815766"
224,224 - progressDeadlineSeconds: 600
226,225 - revisionHistoryLimit: 10
231,229 - strategy:
232,229 - rollingUpdate:
233,229 - maxSurge: 25%
234,229 - maxUnavailable: 25%
235,229 - type: RollingUpdate
238,231 - creationTimestamp: null
270,262 - image: xxx/strapi:sha-1b7c24b0876fdb5c244aa3ada4d96329eb72e1a4
271,262 - imagePullPolicy: IfNotPresent
272,262 + image: xxx/strapi:sha-dd16295f5e3d620ffb6874184abbf91f2b304cbf
277,268 - scheme: HTTP
280,270 - successThreshold: 1
286,275 - protocol: TCP
292,280 - scheme: HTTP
309,296 - scheme: HTTP
311,297 - successThreshold: 1
312,297 - timeoutSeconds: 1
313,297 - terminationMessagePath: /dev/termination-log
314,297 - terminationMessagePolicy: File
318,300 - dnsPolicy: ClusterFirst
319,300 - restartPolicy: Always
320,300 - schedulerName: default-scheduler
321,300 - securityContext: {}
322,300 - terminationGracePeriodSeconds: 30 |
Hey @revolunet ! |
Ok here's the top of the diff for that deployment : note update deployment/app-strapi (apps/v1) namespace: env-xxx-5dc5hx @@
...
8, 8 kapp.k14s.io/change-rule.restore: upsert after upserting kube-workflow/restore.env-xxx-5dc5hx
9, 9 kapp.k14s.io/create-strategy: fallback-on-update
10, 10 kapp.k14s.io/disable-original: ""
11 - kapp.k14s.io/identity: v1;env-xxx-5dc5hx/apps/Deployment/app-strapi;apps/v1
12 - kapp.k14s.io/nonce: "1660207590418011865"
11 + kapp.k14s.io/nonce: "1660209982534815766"
13, 12 kapp.k14s.io/update-strategy: fallback-on-replace
14, 13 creationTimestamp: "2022-08-11T08:49:11Z"
15, 14 generation: 2
16, 15 labels:
...
222,221 resourceVersion: "247917466"
223,222 uid: 2e7466f0-20aa-452c-9f24-b344a4723716
224,223 spec:
225 - progressDeadlineSeconds: 600
226,224 replicas: 1
227 - revisionHistoryLimit: 10
228,225 selector:
229,226 matchLabels:
230,227 component: app-strapi
231,228 kubeworkflow/kapp: xxx
232 - strategy:
233 - rollingUpdate:
234 - maxSurge: 25%
235 - maxUnavailable: 25%
236 - type: RollingUpdate
237,229 template:
238,230 metadata:
239 - creationTimestamp: null
240,231 labels:
241,232 application: xxx
242,233 component: app-strapi
243,234 kapp.k14s.io/association: v1.b90f821a0c6[816](https://github.com/xxx/runs/7783997896?check_suite_focus=true#step:2:837)e919c5ec622aa834cc
...
268,259 name: strapi-configmap
269,260 - secretRef:
270,261 name: pg-user-revolunet-patch-1
271 - image: xxx/strapi:sha-1b7c24b0876fdb5c244aa3ada4d96329eb72e1a4
272 - imagePullPolicy: IfNotPresent
262 + image: xxx/strapi:sha-dd16295f5e3d620ffb6874184abbf91f2b304cbf
273,263 livenessProbe:
274,264 failureThreshold: 15
275,265 httpGet:
276,266 path: /_health
277,267 port: http
278 - scheme: HTTP
279,268 initialDelaySeconds: 30
280,269 periodSeconds: 5
281 - successThreshold: 1
282,270 timeoutSeconds: 5
283,271 name: app
284,272 ports:
285,273 - containerPort: 1337
286,274 name: http
287 - protocol: TCP
288,275 readinessProbe:
289,276 failureThreshold: 15
290,277 httpGet:
291,278 path: /_health
292,279 port: http
293 - scheme: HTTP
294,280 initialDelaySeconds: 10
295,281 periodSeconds: 5
296,282 successThreshold: 1
297,283 timeoutSeconds: 1
...
307,293 httpGet:
308,294 path: /_health
309,295 port: http
310 - scheme: HTTP
311,296 periodSeconds: 5
312 - successThreshold: 1
313 - timeoutSeconds: 1
314 - terminationMessagePath: /dev/termination-log
315 - terminationMessagePolicy: File
316,297 volumeMounts:
317,298 - mountPath: /app/public/uploads
318,299 name: uploads
319 - dnsPolicy: ClusterFirst
320 - restartPolicy: Always
321 - schedulerName: default-scheduler
322 - securityContext: {}
323 - terminationGracePeriodSeconds: 30
324,300 volumes:
325,301 - emptyDir: {}
326,302 name: uploads |
I see the only conflicting change is the annotation
Could you help me understand what the resource on the cluster looks like a bit better. Was it previously deployed by It might be that we are handling some of our own annotations differently while recalculating the diff, I am trying to verify if that is indeed the case 🤔 |
So on the previous deploy, made with kapp (currently up on the cluster) we have :
|
Does the deployment currently have the label |
sorry, missed the labels :
|
Thanks for the prompt replies! Gonna take a closer look at this, this is definitely not expected. However, I cannot reproduce the exact issue y'all have been running into :( The closest I could get was over here in the similar reproduction I posted, where Marking this as a bug for now, since looks like the metadata on the deployment is as expected (assuming that |
Thanks for your help, we're digging here too. yes, the ns is |
meanwhile, any strategy to force the deployment ? |
for |
kapp already has rebase rules for all these fields (including resourceVersion), I am not sure how that is causing a conflict. If you run |
Yes, exactly what I thought, so, in my actual knowledge there is no others fields in conflict, I continue to look for logs |
Yep, and because of that rebase rule, the
Thank you 🙏 Logs would definitely help. |
Here is one, as you can see, many of theses fields are defaults, we don't specify
|
here is an other: in this one path is changing as image, but the conflict is only old vs new
|
here is the last one:
|
sorry, I don't have original diff, I will add the flags |
Thank you. Just |
OK got it, thanks |
Hi @devthejo! Were you able to collect logs of errors with the original diff? |
Hi @praveenrewar ! Here it is, sorry for the delay, I was in vacations:
|
Thank you! Target cluster 'https://rancher.******'
@@ update deployment/simulateur (apps/v1) namespace: egapro-feat-add-index-subrouting-for-declatation-djn8zr @@
...
11 - kapp.k14s.io/identity: v1;egapro-feat-add-index-subrouting-for-declatation-djn8zr/apps/Deployment/simulateur;apps/v1
12 - kapp.k14s.io/nonce: "1664877613047615413"
11 + kapp.k14s.io/nonce: "1664880936933517787"
200 - progressDeadlineSeconds: 600
202 - revisionHistoryLimit: 10
207 - strategy:
208 - rollingUpdate:
209 - maxSurge: 25%
210 - maxUnavailable: 25%
211 - type: RollingUpdate
214 - creationTimestamp: null
241 - - image: harbor.fabrique.social.gouv.fr/egapro/egapro/simulateur:sha-dd68d2376c6a3bc3896578fba4fdf652046a17ad
242 - imagePullPolicy: IfNotPresent
232 + - image: harbor.fabrique.social.gouv.fr/egapro/egapro/simulateur:sha-c4934d8459daf82ab93b3e661f2cd4b8a3353672
248 - scheme: HTTP
251 - successThreshold: 1
257 - protocol: TCP
263 - scheme: HTTP
280 - scheme: HTTP
282 - successThreshold: 1
283 - timeoutSeconds: 1
284 - terminationMessagePath: /dev/termination-log
285 - terminationMessagePolicy: File
286 - dnsPolicy: ClusterFirst
287 - restartPolicy: Always
288 - schedulerName: default-scheduler
289 - securityContext: {}
290 - terminationGracePeriodSeconds: 30
---
10:57:04AM: update deployment/simulateur (apps/v1) namespace: egapro-feat-add-index-subrouting-for-declatation-djn8zr
[2022-10-04 10:57:04] WARN: kapp: Error: Applying update deployment/simulateur (apps/v1) namespace: egapro-feat-add-index-subrouting-for-declatation-djn8zr:
Failed to update due to resource conflict (approved diff no longer matches):
Updating resource deployment/simulateur (apps/v1) namespace: egapro-feat-add-index-subrouting-for-declatation-djn8zr:
API server says:
Operation cannot be fulfilled on deployments.apps "simulateur": the object has been modified; please apply your changes to the latest version and try again (reason: Conflict):
Recalculated diff:
11, 11 - kapp.k14s.io/nonce: "1664877613047615413"
12, 11 + kapp.k14s.io/nonce: "1664880936933517787"
199,199 - progressDeadlineSeconds: 600
201,200 - revisionHistoryLimit: 10
206,204 - strategy:
207,204 - rollingUpdate:
208,204 - maxSurge: 25%
209,204 - maxUnavailable: 25%
210,204 - type: RollingUpdate
213,206 - creationTimestamp: null
240,232 - - image: harbor.fabrique.social.gouv.fr/egapro/egapro/simulateur:sha-dd68d2376c6a3bc3896578fba4fdf652046a17ad
241,232 - imagePullPolicy: IfNotPresent
242,232 + - image: harbor.fabrique.social.gouv.fr/egapro/egapro/simulateur:sha-c4934d8459daf82ab93b3e661f2cd4b8a3353672
247,238 - scheme: HTTP
250,240 - successThreshold: 1
256,245 - protocol: TCP
262,250 - scheme: HTTP
279,266 - scheme: HTTP
281,267 - successThreshold: 1
282,267 - timeoutSeconds: 1
283,267 - terminationMessagePath: /dev/termination-log
284,267 - terminationMessagePolicy: File
285,267 - dnsPolicy: ClusterFirst
286,267 - restartPolicy: Always
287,267 - schedulerName: default-scheduler
288,267 - securityContext: {}
289,267 - terminationGracePeriodSeconds: 30 |
@revolunet I was going through the above discussions. You are using And also, is there any specific reason that you choose to go with |
@rohitagg2020 sorry i dont know the difference i only tested with this config. |
@revolunet Labeled app is a kapp app with minimal configuration (just ask for label). e.g. Recorded app makes this a bit nicer for common cases like generating unique label, and be able to find it later by name. e.g. Based on this, it looks like you are using Also from the above conversation, I understand that you are using a combination of I am trying to understand the recent state as it seems like lots of things were tried. |
Yes If i remember correctly, the issue was due to some pod restarts while deploying, maybe related to reloader |
Reloader was disabled long since to avoid restarting but problem is persisting |
Hi, i dont really understand some conflict errors, maybe someone can help
Heres an example; these fields appear in the diff :
kapp.k14s.io/nonce
: sounds legitimage
: legit as its a new versioninitialDelaySeconds
andcpu
: i guess its been "rewritten" by kube APIThese changes looks legit but make kapp fails, any idea how to prevent this ?
The text was updated successfully, but these errors were encountered: