feat: Drop unnecessary listing for the sake of watch reinitialization #616

tosi3k · 2024-07-23T06:14:43Z

This change addresses the performance issue existing in the cluster cache described in argoproj/argo-cd#18838.

kube-apiserver logs for the Pods resource (supposed super-low latency logged for the WATCH requests is due to a bug in Kubernetes: kubernetes/kubernetes#125614):

INFO 2024-07-23T05:26:40.330372Z "HTTP" verb="LIST" URI="/api/v1/pods?limit=500&resourceVersion=0" latency="95.186516ms" userAgent="argocd-application-controller/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="976ba7bb-40cc-4c2b-9739-e15c5e78415f" srcIP="10.64.11.10:35741" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="94.722114ms" resp=200
INFO 2024-07-23T05:26:40.516139Z "HTTP" verb="WATCH" URI="/api/v1/pods?allowWatchBookmarks=true&resourceVersion=10636&timeoutSeconds=600&watch=true" latency="1.212642ms" userAgent="argocd-application-controller/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="26593f98-914a-4711-8eea-db9f284a8520" srcIP="10.64.11.10:35741" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="529.67µs" apf_execution_time="533.174µs" resp=0
INFO 2024-07-23T05:36:40.518449Z "HTTP" verb="WATCH" URI="/api/v1/pods?allowWatchBookmarks=true&resourceVersion=17070&timeoutSeconds=600&watch=true" latency="1.104058ms" userAgent="argocd-application-controller/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="6e495250-eb2b-4af4-bb09-d999197d7e73" srcIP="10.64.11.10:35741" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="531.187µs" apf_execution_time="532.709µs" resp=0
INFO 2024-07-23T05:46:40.522146Z "HTTP" verb="WATCH" URI="/api/v1/pods?allowWatchBookmarks=true&resourceVersion=23542&timeoutSeconds=600&watch=true" latency="988.866µs" userAgent="argocd-application-controller/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="05329b6e-8bea-44c0-b22a-42b7ce65b796" srcIP="10.64.11.10:35741" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="499.414µs" apf_execution_time="500.85µs" resp=0
INFO 2024-07-23T05:56:40.524950Z "HTTP" verb="WATCH" URI="/api/v1/pods?allowWatchBookmarks=true&resourceVersion=29971&timeoutSeconds=600&watch=true" latency="995.693µs" userAgent="argocd-application-controller/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="fe66a06c-1329-4739-90f0-f0cdd4dd821d" srcIP="10.64.11.10:35741" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="455.428µs" apf_execution_time="456.954µs" resp=0

crenshaw-dev · 2024-07-24T18:33:20Z

pkg/cache/cluster.go

@@ -666,8 +656,10 @@ func (c *clusterCache) watchEvents(ctx context.Context, api kube.APIResourceInfo

 				obj, ok := event.Object.(*unstructured.Unstructured)
 				if !ok {
+					resourceVersion = ""


I'd like to see if we can get away without resetting this and using whatever resourceVersion we already had. If we find we get into some nasty loop, we can always re-add it.

We can't drop resetting the locally held RV here as well - put appropriate rationale in a comment in the code there as well.

crenshaw-dev · 2024-07-24T18:35:03Z

pkg/cache/cluster.go

 			// re-synchronize API state and restart watch if retry watcher failed to continue watching using provided resource version
 			case <-w.Done():
+				resourceVersion = ""


Maybe see if we can get away without this? If we need it, we'll probably want to add a comment exactly why we needed it.

We can't drop it - I put appropriate comment in the code to explain the rationale behind resetting resourceVersion.

This is the place where the retry watcher exits early and we are forced to relist in order to get a fresher RV from the server: https://github.com/kubernetes/kubernetes/blob/0fc167103189a4d462c3cc7a17b360c3d998f4bf/staging/src/k8s.io/client-go/tools/watch/retrywatcher.go#L208-L211.

+1 - this has to remain here.

This should generally be rare, so relist from etcd is fine in those situations.

pkg/cache/cluster.go

tosi3k · 2024-07-25T11:16:21Z

@crenshaw-dev thanks for the review - I'll respond to the comments here later on.

FWIW as agreed offline during yesterday's sync, I split the fix into two PRs - this one would just drop unnecessary listing after watch expiry and #617 would make the list API calls target the watch cache instead of etcd.

I guess we can proceed with the latter one.

Signed-off-by: Antoni Zawodny <[email protected]>

sonarcloud · 2024-07-30T15:59:10Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

wojtek-t · 2024-07-31T08:06:31Z

This LGTM from k8s perspective.

tosi3k force-pushed the clustercache-list-watch branch from 0c6045d to 4029ec6 Compare July 23, 2024 06:18

crenshaw-dev reviewed Jul 24, 2024

View reviewed changes

tosi3k force-pushed the clustercache-list-watch branch from 4029ec6 to b4fa2a7 Compare July 25, 2024 09:58

tosi3k changed the title ~~feat: List resources from the watch cache~~ feat: Drop unnecessary listing for the sake of watch reinitialization Jul 25, 2024

tosi3k mentioned this pull request Jul 26, 2024

feat: Make cluster cache target the watch cache instead of etcd #617

Open

tosi3k force-pushed the clustercache-list-watch branch from b4fa2a7 to 409990b Compare July 30, 2024 15:57

feat: Drop unnecessary listing for the sake of watch reinitialization

f7c5e6e

Signed-off-by: Antoni Zawodny <[email protected]>

tosi3k force-pushed the clustercache-list-watch branch from 409990b to f7c5e6e Compare July 30, 2024 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Drop unnecessary listing for the sake of watch reinitialization #616

feat: Drop unnecessary listing for the sake of watch reinitialization #616

tosi3k commented Jul 23, 2024

crenshaw-dev Jul 24, 2024

tosi3k Jul 30, 2024

crenshaw-dev Jul 24, 2024

tosi3k Jul 30, 2024

wojtek-t Jul 31, 2024

tosi3k commented Jul 25, 2024 •

edited

Loading

sonarcloud bot commented Jul 30, 2024

wojtek-t commented Jul 31, 2024

feat: Drop unnecessary listing for the sake of watch reinitialization #616

Are you sure you want to change the base?

feat: Drop unnecessary listing for the sake of watch reinitialization #616

Conversation

tosi3k commented Jul 23, 2024

crenshaw-dev Jul 24, 2024

Choose a reason for hiding this comment

tosi3k Jul 30, 2024

Choose a reason for hiding this comment

crenshaw-dev Jul 24, 2024

Choose a reason for hiding this comment

tosi3k Jul 30, 2024

Choose a reason for hiding this comment

wojtek-t Jul 31, 2024

Choose a reason for hiding this comment

tosi3k commented Jul 25, 2024 • edited Loading

sonarcloud bot commented Jul 30, 2024

Quality Gate passed

wojtek-t commented Jul 31, 2024

tosi3k commented Jul 25, 2024 •

edited

Loading