Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-47505: pkg/operator/credentialsrequest: hasRecentlySynced backoff for STS too #812

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

wking
Copy link
Member

@wking wking commented Dec 20, 2024

e9f9cc6 (#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request). For example in this recent CI run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
--
time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
--
time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Dec 20, 2024
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-47505, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianping-shu

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

e9f9cc6 (#542) created the STS-specfic branch here, and shifted the pre-existing hasRecentlySynced check to the non-STS branch. But that's leading to hot update loops, as the reconciler bangs away bumping status.lastSyncTimestamp (which we've had since the initial cloud-cred operator pull request). For example in this recent CI run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
--
time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
--
time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too, to avoid flooding the Kube API server with status.lastSyncTimestamp updates.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Dec 20, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
Once this PR has been reviewed and has the lgtm label, please assign dlom for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking wking force-pushed the hasRecentlySynced-for-STS branch from 7f2f807 to 5a40e2b Compare December 20, 2024 21:26
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 6 lines in your changes missing coverage. Please review.

Project coverage is 47.03%. Comparing base (2395cbc) to head (68bd94e).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
pkg/aws/actuator/actuator.go 57.14% 5 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #812      +/-   ##
==========================================
+ Coverage   47.01%   47.03%   +0.02%     
==========================================
  Files          97       97              
  Lines       11873    11876       +3     
==========================================
+ Hits         5582     5586       +4     
  Misses       5676     5676              
+ Partials      615      614       -1     
Files with missing lines Coverage Δ
...redentialsrequest/credentialsrequest_controller.go 44.53% <100.00%> (+0.30%) ⬆️
pkg/aws/actuator/actuator.go 64.81% <57.14%> (+0.07%) ⬆️

@wking
Copy link
Member Author

wking commented Dec 20, 2024

/test e2e-aws-manual-oidc

@wking
Copy link
Member Author

wking commented Dec 23, 2024

e2e-aws-manual-oidc:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/812/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1870223724137943040/artifacts/e2e-aws-manual-oidc/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2
$ zgrep -h '"resource":"credentialsrequests"' kube-apiserver/*audit*.log.gz | jq -r '.verb + " " + (.responseStatus.code | tostring) + " " + (.objectRef | .resource + " " + .namespace) + " " + .user.username + " " + .userAgent' | sort | uniq -c | sort -n | tail -n3
    147 get 200 credentialsrequests openshift-cloud-credential-operator system:serviceaccount:kube-system:generic-garbage-collector kube-controller-manager/v1.31.3 (linux/amd64) kubernetes/3c62f73/system:serviceaccount:kube-system:generic-garbage-collector
   1176 get 200 credentialsrequests openshift-cloud-credential-operator system:serviceaccount:openshift-cluster-version:default cluster-version-operator/v0.0.0 (linux/amd64) kubernetes/$Format
   7761 update 200 credentialsrequests openshift-cloud-credential-operator system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator cloud-credential-operator/v0.0.0 (linux/amd64) kubernetes/$Format

So still lots of update requests going on.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 23, 2024
@wking
Copy link
Member Author

wking commented Dec 24, 2024

/test e2e-aws-manual-oidc

@wking wking force-pushed the hasRecentlySynced-for-STS branch from 2273526 to 9d55785 Compare December 26, 2024 18:15
@wking
Copy link
Member Author

wking commented Dec 26, 2024

/test e2e-aws-manual-oidc

@wking wking force-pushed the hasRecentlySynced-for-STS branch from 9d55785 to 58662ad Compare December 26, 2024 22:19
@wking
Copy link
Member Author

wking commented Dec 26, 2024

/test e2e-aws-manual-oidc

2 similar comments
@wking
Copy link
Member Author

wking commented Dec 26, 2024

/test e2e-aws-manual-oidc

@wking
Copy link
Member Author

wking commented Dec 27, 2024

/test e2e-aws-manual-oidc

@wking wking force-pushed the hasRecentlySynced-for-STS branch from b7ac447 to 5d83037 Compare December 27, 2024 02:47
@wking
Copy link
Member Author

wking commented Dec 27, 2024

/test e2e-aws-manual-oidc

@wking wking force-pushed the hasRecentlySynced-for-STS branch from 7d7c50a to 166a88b Compare January 2, 2025 17:41
@wking
Copy link
Member Author

wking commented Jan 2, 2025

Getting closer :) Current issue seems to be crSecretExists, e.g.:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/812/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1872474533076668416/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-57d4b598f4-xnsqv_cloud-credential-operator.log | grep 'adding label selector\|ANDed.*ingress' | head -n3
time="2024-12-27T03:48:44Z" level=info msg="adding label selector cloudcredential.openshift.io/credentials-request=true to cache options for Secrets"
time="2024-12-27T03:48:52Z" level=info msg="The above are ANDed together to determine: lastsyncgeneration is current and lastsynctimestamp < 1h0m0s" NOT cloudCredsSecretUpdated=false NOT hasActiveFailureConditions=false NOT isInfrastructureUpdated=true NOT isStale=true cr.Status.Provisioned=false crSecretExists=false hasRecentlySynced=false infraResourceVersion=524 infraResourceVersionSynced=524 name=openshift-ingress
time="2024-12-27T03:48:55Z" level=info msg="The above are ANDed together to determine: lastsyncgeneration is current and lastsynctimestamp < 1h0m0s" NOT cloudCredsSecretUpdated=false NOT hasActiveFailureConditions=false NOT isInfrastructureUpdated=true NOT isStale=false cr.Status.Provisioned=true crSecretExists=false hasRecentlySynced=true infraResourceVersion=524 infraResourceVersionSynced=524 name=openshift-ingress
grep: write error: Broken pipe

because the Secret lacks the cloudcredential.openshift.io/credentials-request label:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/812/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1872474533076668416/artifacts/e2e-aws-manual-oidc/gather-must-gather/artifacts/must-gather.tar | tar -xOz registry-build09-ci-openshift-org-ci-op-dfvtff1y-stable-sha256-3c1ee39e49aa50c29669b65a9a9b84b0c777bc324477a6e898e5afd0359752b9/namespaces/openshift-ingress-operator/core/secrets.yaml | yaml2json | jq '.items[].metadata | select(.name == "cloud-credentials") | del(.managedFields)'
{
  "creationTimestamp": "2024-12-27T03:43:22Z",
  "name": "cloud-credentials",
  "namespace": "openshift-ingress-operator",
  "resourceVersion": "636",
  "uid": "27c83668-eb5f-49f0-b2d3-7227d149ccbf"
}

But for some reason, the operator thinks a filtered watch is possible, see the adding label selector cloudcredential.openshift.io/credentials-request=true to cache options for Secrets line logged above (edit: because the operator only worries about presence/absence of the label on Secrets that have a CCO-specific annotation, and these STS Secrets lack both the annotation and the label). I'm not sure if we want to skip the filtered watch for manual/STS clusters or not. Watching all of a clusters Secrets is expensive, as discussed in #545. But adding labels to a Secret that's provided by external tooling risks getting into a contention-hotloop if the external tooling tries to stomp our label back off.

@wking wking force-pushed the hasRecentlySynced-for-STS branch 2 times, most recently from 6a4d449 to 2aa54ca Compare January 3, 2025 06:55
@wking
Copy link
Member Author

wking commented Jan 3, 2025

/test e2e-aws-manual-oidc

1 similar comment
@wking
Copy link
Member Author

wking commented Jan 3, 2025

/test e2e-aws-manual-oidc

@wking wking force-pushed the hasRecentlySynced-for-STS branch from b6570f4 to ebfa76e Compare January 3, 2025 22:17
@wking
Copy link
Member Author

wking commented Jan 3, 2025

/test e2e-aws-manual-oidc

wking added 5 commits January 3, 2025 15:38
Because "failed to process" is hard to debug, unless you have the
detailed error message explaining why processing failed.
Instead of the hard-coded "an hour ago".  Catching up with 452bbc4
(add new credentials field for AWS Secrets, 2020-11-03, openshift#264), which
created the syncPeriod variable.  This pivot avoids the risk of the
variable being updated and pushing the logs out of sync with the new
duration value.
…condition

Adding a "NOT" to the logged cloudCredsSecretUpdated field, because
the following 'if' condition is !cloudCredsSecretUpdated.  The lack of
"NOT" seems to have been accidental oversight when the logging fields
were added in 0a0d849 (Changes to address PR comments from Steve
~3d ago, 2023-06-27, openshift#542).

I'm also adding isInfrastructureUpdated logging to catch up with
cea55c6 (Added implementation for AWS Day2 Tag reconcilation
Support, 2024-09-24, openshift#759), when it was added to the 'if' condition
but overlooked in field logging.
e9f9cc6 (Add & logic - new token CredReq.spec.cred* fields,
2023-06-27, openshift#542) created the STS-specfic branch here, and shifted the
pre-existing hasRecentlySynced check to the non-STS branch.  But
that's leading to hot update loops, as the reconciler bangs away
bumping status.lastSyncTimestamp (which we've had since the initial
cloud-cred operator pull request [1]).  For example in this recent CI
run [2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13
  --
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status"
  --
  time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress
  time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status"
  time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
  time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws

With this commit, we'll perform the same back-off in the STS case too,
to avoid flooding the Kube API server with status.lastSyncTimestamp
updates.

[1]: openshift@a6d385a#diff-69794ca0db76a04660e3355ba9b824f34e7af1030d0a8114903d11847201c410R46
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
…rsion for STS

Before this commit, STS updates didn't bump
lastSyncInfrastructureResourceVersion, so isInfrastructureUpdated was
always false and the controller thought it needed to update the
CredentialsRequest status every time.

With this commit, STS CredentialsRequests will have
lastSyncInfrastructureResourceVersion updated (just like non-STS
requests), so we have a chance at 'lastsyncgeneration is current...'
cool-off periods (as long as all the other cool-off conditions are
also met).
@wking wking force-pushed the hasRecentlySynced-for-STS branch from ebfa76e to 81fd3d0 Compare January 3, 2025 23:56
@wking
Copy link
Member Author

wking commented Jan 3, 2025

/test e2e-aws-manual-oidc

@wking wking force-pushed the hasRecentlySynced-for-STS branch from 81fd3d0 to afc457a Compare January 4, 2025 01:09
@wking
Copy link
Member Author

wking commented Jan 4, 2025

/test e2e-aws-manual-oidc

Even when awsSTSIAMRoleARN is empty, we want the label so that
pkg/cmd/operator's NewOperator's filteredWatchPossible label-selector
can find these Secrets.  Then the controller will notice if they're
deleted (so it can update the CredentialsRequest status to point that
out) or when they haven't been changed (so it can avoid "I can't find
the Secret!" overly-frequent bumping in the hasRecentlySynced
calculation, because it thinks crSecretExists=false).

And we want the annotation, so it's clear why the Secret needs to
exist (because of the annotation-referenced CredentialsRequest).

The risk here is that we might end up contending over label/annotation
presence with the external controller that is populating the
'credentials' data inside the Secret.  But the alternative of an
unfiltered Secret informer in the client is still too
resource-intensive, as described in the filteredWatchPossible comment
and the a58a09c (*: use a filtered LIST + WATCH on Secrets for AWS
STS, 2023-06-29, openshift#545) commit that added the filteredWatchPossible
logic.  Additional labels and annotations are properties that external
controllers should be able to accept.  For example, [1] has ArgoCD
discussing:

  apiVersion: argoproj.io/v1alpha1
  kind: ApplicationSet
  spec:
    # (...)
    preservedFields:
      annotations: ["my-custom-annotation"]
      labels: ["my-custom-label"]

to ignore annotations and labels injected by external-to-ArgoCD
controllers, which is what the CCO-specific annotation/label I'm
touching now would be.

Moving to 48d6ccc (pkg/operator: correctly fetch CA for AWS minter,
2023-07-19, openshift#575)'s LiveClient avoids confusing CreateOrPatch.  With
the cached .Client, it would have:

1. Failed to retrive an unlabeled Secret, because the
   externally-created Secret lacked the label that the Client's
   filteredWatchPossible informer is filtered on.
2. Thought that it should Create a new Secret.
3. Had that Create attempt fail on 'secrets "$NAME" already exists'.

With the LiveClient, that becomes:

1. Successfully retrived an unlabeled Secret, with the uncached
   reader.
2. Thought that it should Patch the Secret.
3. Successfully Patch the Secret.
4. Once the Patch sets the label, future attempts to Get the Secret
   through the filtered informer cache will succeed.

[1]: https://argo-cd.readthedocs.io/en/release-2.13/operator-manual/applicationset/Controlling-Resource-Modification/#preserving-changes-made-to-an-applications-annotations-and-labels
@wking wking force-pushed the hasRecentlySynced-for-STS branch from afc457a to 68bd94e Compare January 4, 2025 03:29
@wking
Copy link
Member Author

wking commented Jan 4, 2025

/test e2e-aws-manual-oidc

Copy link
Contributor

openshift-ci bot commented Jan 4, 2025

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 68bd94e link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Jan 6, 2025

The e2e-aws-manual-oidc run looks good to me:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/812/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1875391744414060544/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-d785b46fc-9tll9_cloud-credential-operator.log | grep secret=openshift-ingress-operator/cloud-credentials
time="2025-01-04T04:21:17Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress-gcp secret=openshift-ingress-operator/cloud-credentials
time="2025-01-04T04:21:17Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress-azure secret=openshift-ingress-operator/cloud-credentials
time="2025-01-04T04:21:19Z" level=info msg="adding finalizer: cloudcredential.openshift.io/deprovision" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
time="2025-01-04T04:21:25Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials
time="2025-01-04T04:21:28Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials

So a few bumps early in the cluster's install phase, and then quiet :). Kube API server audit logs are also much quieter now than they were before, without thousands of update leading the hot list:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/812/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1875391744414060544/artifacts/e2e-aws-manual-oidc/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2
$ zgrep -h '"resource":"credentialsrequests"' kube-apiserver/*audit*.log.gz | jq -r '.verb + " " + (.responseStatus.code | tostring) + " " + (.objectRef | .resource + " " + .namespace) + " " + .user.username + " " + .userAgent' | sort | uniq -c | sort -n | tail -n3
     49 get 200 credentialsrequests openshift-cloud-credential-operator system:serviceaccount:openshift-must-gather-7qfrg:default Go-http-client/2.0
    196 get 200 credentialsrequests openshift-cloud-credential-operator system:serviceaccount:kube-system:generic-garbage-collector kube-controller-manager/v1.31.3 (linux/amd64) kubernetes/3c62f73/system:serviceaccount:kube-system:generic-garbage-collector
   1225 get 200 credentialsrequests openshift-cloud-credential-operator system:serviceaccount:openshift-cluster-version:default cluster-version-operator/v0.0.0 (linux/amd64) kubernetes/$Format

@wking
Copy link
Member Author

wking commented Jan 6, 2025

Happy CI, showing the issue fixed there, so:

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2025
@jstuever jstuever self-assigned this Jan 7, 2025
@jstuever
Copy link
Contributor

jstuever commented Jan 8, 2025

/test e2e-gco-manual-oidc e2e-azure-manual-oidc

@jstuever
Copy link
Contributor

jstuever commented Jan 8, 2025

/test e2e-gcp-manual-oidc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants