-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCO-626: pkg/operator/utils: Log diff on CredentialsRequest status change #811
CCO-626: pkg/operator/utils: Log diff on CredentialsRequest status change #811
Conversation
06797ff
to
e61e7f0
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #811 +/- ##
=======================================
Coverage 47.01% 47.01%
=======================================
Files 97 97
Lines 11873 11874 +1
=======================================
+ Hits 5582 5583 +1
Misses 5676 5676
Partials 615 615
|
/test e2e-aws-manual-oidc |
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/811/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1870219565347115008/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-6f9c9d8865-5rnr2_cloud-credential-operator.log | gre
p 'due to diff' | tail -n1
time="2024-12-20T22:49:10Z" level=info msg="Updating CredentialsRequest openshift-cloud-credential-operator/openshift-cluster-api-aws status due to diff: \u00a0\u00a0v1.CredentialsRequestStatus{\n\u00a0\u00a0\tProvisioned: true,\n-\u00a0\tLastSyncTimestamp: s\"2024-12-20 22:49:07 +0000 UTC\",\n+\u00a0\tLastSyncTimestamp: s\"2024-12-20 22:49:10 +0000 UTC\",\n\u00a0\u00a0\tLastSyncGeneration: 1,\n\u00a0\u00a0\tLastSyncCloudCredsSecretResourceVersion: \"\",\n\u00a0\u00a0\t... // 3 identical fields\n\u00a0\u00a0}\n" controller=credreq cr=openshift-cloud-credential-operator/openshift-cluster-api-aws secret=openshift-cluster-api/capa-manager-bootstrap-credentials Shuffle around with $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/811/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1870219565347115008/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-6f9c9d8865-5rnr2_cloud-credential-operator.log | grep 'due to diff' | tail -n1 | sed 's/.*msg=//;s/ controller=credreq.*//' | jq -r .
Updating CredentialsRequest openshift-cloud-credential-operator/openshift-cluster-api-aws status due to diff: v1.CredentialsRequestStatus{
Provisioned: true,
- LastSyncTimestamp: s"2024-12-20 22:49:07 +0000 UTC",
+ LastSyncTimestamp: s"2024-12-20 22:49:10 +0000 UTC",
LastSyncGeneration: 1,
LastSyncCloudCredsSecretResourceVersion: "",
... // 3 identical fields
} So as expected (see discussion in #812), it's just the |
/assign |
042bbc9
to
0bfbfe7
Compare
@wking: This pull request references CCO-626 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0bfbfe7
to
c0304f0
Compare
/lgtm |
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/811/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-ovn/1874902068603392000/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-6d54b8c4d6-f5szz_cloud-credential-operator.log | grep -A1 'Updating CredentialsRequest .* status' | head -n2
time="2025-01-02T21:24:30Z" level=info msg="Updating CredentialsRequest openshift-cloud-credential-operator/cloud-credential-operator-iam-ro status" controller=credreq cr=openshift-cloud-credential-operator/cloud-credential-operator-iam-ro secret=openshift-cloud-credential-operator/cloud-credential-operator-iam-ro-creds
time="2025-01-02T21:24:30Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-cloud-network-config-controller-aws So that's got the |
'status has changed, updating' shows that *something* is changing. But without a diff, it's hard to figure out what. For example in this recent CI run [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848/artifacts/e2e-aws-manual-oidc/gather-extra/artifacts/pods/openshift-cloud-credential-operator_cloud-credential-operator-64bcbb54cc-vzs5s_cloud-credential-operator.log | grep -1 ingress | tail -n13 -- time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:49Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:49Z" level=info msg="reconciling clusteroperator status" -- time="2024-12-05T21:44:51Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress time="2024-12-05T21:44:52Z" level=info msg="reconciling clusteroperator status" time="2024-12-05T21:44:52Z" level=info msg="status has changed, updating" controller=credreq cr=openshift-cloud-credential-operator/openshift-ingress secret=openshift-ingress-operator/cloud-credentials time="2024-12-05T21:44:52Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws Is that a hot loop? Is something really changing? Hard to debug. With this commit, we'll log the change we make. It will increase the log level when there's a hot loop, but it will also make it easier for us to identify and fix hot loops, so overall log verbosity should go down (more text per update * orders of magnitude fewer updates). We don't need to explicitly include the CredentialsRequest namespace and name here, because earlier in the stack, ReconcileCredentialsRequest.Reconcile sets that context with: logger := log.WithFields(log.Fields{ "controller": controllerName, "cr": fmt.Sprintf("%s/%s", request.NamespacedName.Namespace, request.NamespacedName.Name), }) as you can see in the cr=openshift-cloud-credential-operator/openshift-ingress rendering in the "status has changed" lines I quoted above. [1]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/789/pull-ci-openshift-cloud-credential-operator-master-e2e-aws-manual-oidc/1864768460005838848
c0304f0
to
5a86535
Compare
I think it's worth addressing hotlooping issues, and that they'll be rare enough that we don't have to worry too much about the log level of the diff. But Jeremiah is concerned about log volume, and we do have a hot loop in [1] today that hasn't been fixed yet. This commit pushes the diff-rendering down to the debug level. Users can set cloudcredentials.operator.openshift.io spec.logLevel to Debug or higher to see the diff. [1]: https://issues.redhat.com/browse/OCPBUGS-47505
5a86535
to
056020e
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jstuever, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[ART PR BUILD NOTIFIER] Distgit: ose-cloud-credential-operator |
status has changed, updating
shows that something is changing. But without a diff, it's hard to figure out what. For example in this recent CI run:Is that a hot loop? Is something really changing? Hard to debug.
With this commit, we'll log the change we make. It will increase the log level when there's a hot loop, but it will also make it easier for us to identify and fix hot loops, so overall log verbosity should go down (more text per update * orders of magnitude fewer updates).