Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argocd_cluster_connection_status still reporting on old endpoint after modifying cluster settings #20782

Open
nbarrientos opened this issue Nov 13, 2024 · 0 comments
Labels
bug Something isn't working component:application-controller component:metrics version:2.12 Latest confirmed affected version is 2.12

Comments

@nbarrientos
Copy link

nbarrientos commented Nov 13, 2024

I have observed that If the server key of an existing cluster is modified, the application controller assigned to that cluster will report on the old endpoint indefinitely (?) until the application-controller pod is re-created.

To Reproduce

  • Declare a cluster called foo pointing to server1
  • Declare an application deploying to cluster foo
  • Change the server of cluster foo and point it to endpoint server2
  • Sync the application so it deploys to the new endpoint of cluster foo (server2).

Now there should be two series on the application controller metrics endpoint, reporting on both endpoints (some labels removed):

argocd_cluster_connection_status{
  container="application-controller", 
  endpoint="http-metrics", 
  job="argocd-application-controller-metrics", 
  k8s_version="1.31", 
  namespace="argocd", 
  pod="argo-argocd-application-controller-1", 
  server="https://server1:6443", 
  service="argo-argocd-application-controller-metrics"} 0

argocd_cluster_connection_status{
  container="application-controller", 
  endpoint="http-metrics", 
  job="argocd-application-controller-metrics", 
  k8s_version="1.31", namespace="argocd", 
  pod="argo-argocd-application-controller-1", 
  server="https://server2:6443", 
  service="argo-argocd-application-controller-metrics"} 1

The first sample has 0 as value because in my case that endpoint was immediately disabled after changing Argo CD's configuration.

Expected behavior

I think that no status should be reported about server1 as soon as the endpoint is not known anymore to Argo CD (or at least as soon as it's not needed anymore). Deleting the pod (to force a re-creation) of the offending application controller restores the desired behavior and the metric for server1 is immediately gone from the metrics endpoint. In other words, only information about server2 is reported (as expected, IMO).

The impact of this bug is that monitoring argocd_cluster_connection_status and alarming when any value is 0 might trigger "false positives" as, once the cluster configuration is changed, the old endpoint could be gone at any moment due to the cluster being deleted (see above server1 reporting 0 as value). In other words, observing argocd_cluster_connection_status might not be trustworthy due to this behavior.

Would that metric have been purged without having to restart the controller if I had set --metrics-cache-expiration (disabled by default)? If so, could the documentation be updated to describe that cluster endpoint information is also "cached" and the status reported? Otherwise, please consider this report a bug.

Version

argocd-server: v2.12.6+4dab5bd
  BuildDate: 2024-10-18T17:39:26Z
  GitCommit: 4dab5bd6a60adea12e084ad23519e35b710060a2
  GitTreeState: clean
  GoVersion: go1.22.4
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.4.2 2024-05-22T15:19:38Z
  Helm Version: v3.15.2+g1a500d5
  Kubectl Version: v0.29.6
  Jsonnet Version: v0.20.0
@nbarrientos nbarrientos added the bug Something isn't working label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:application-controller component:metrics version:2.12 Latest confirmed affected version is 2.12
Projects
None yet
Development

No branches or pull requests

2 participants