Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision stays in ContainerMissing condition forever after a temporary failure of digest resolution #15466

Open
maschmid opened this issue Aug 13, 2024 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@maschmid
Copy link
Contributor

/area reconciler

What version of Knative?

1.14

Expected Behavior

After a temporary error in digest resolution causes a ContainerHealthy condition to be False due to ContainerMissing , when the digest resolution is eventually successful, the ContainerHealthy should be True.

Actual Behavior

After a temporary error in digest resolution, when the digest resolution is eventually successful, the Revision stays in this inconsistent broken state:

status:
  actualReplicas: 1
  conditions:
  - lastTransitionTime: "2024-08-12T22:30:16Z"
    severity: Info
    status: "True"
    type: Active
  - lastTransitionTime: "2024-08-12T22:28:04Z"
    message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
      failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
      unexpected status code 401 Unauthorized'
    reason: ContainerMissing
    status: "False"
    type: ContainerHealthy
  - lastTransitionTime: "2024-08-12T22:28:04Z"
    message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
      failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
      unexpected status code 401 Unauthorized'
    reason: ContainerMissing
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-08-12T22:30:12Z"
    status: "True"
    type: ResourcesAvailable
  containerStatuses:
  - imageDigest: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp@sha256:e915478407c5c882346c4fc72078007fd2511d9e1796345db1873facafddf836
    name: user-container
  desiredReplicas: 1
  observedGeneration: 1

Notice the containerStatuses showing the resolved image digest , the deployments are Ready (with ResourcesAvailable being True), but the ContainerHealthy still being False with the original digest resolution error.

Steps to Reproduce the Problem

Currently does not have a reproducer, noticed the problem on a long running test

@maschmid maschmid added the kind/bug Categorizes issue or PR as related to a bug. label Aug 13, 2024
Copy link

knative-prow bot commented Aug 13, 2024

@maschmid: The label(s) area/reconciler cannot be applied, because the repository doesn't have them.

In response to this:

/area reconciler

What version of Knative?

1.14

Expected Behavior

After a temporary error in digest resolution causes a ContainerHealthy condition to be False due to ContainerMissing , when the digest resolution is eventually successful, the ContainerHealthy should be True.

Actual Behavior

After a temporary error in digest resolution, when the digest resolution is eventually successful, the Revision stays in this inconsistent broken state:

status:
 actualReplicas: 1
 conditions:
 - lastTransitionTime: "2024-08-12T22:30:16Z"
   severity: Info
   status: "True"
   type: Active
 - lastTransitionTime: "2024-08-12T22:28:04Z"
   message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
     failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
     unexpected status code 401 Unauthorized'
   reason: ContainerMissing
   status: "False"
   type: ContainerHealthy
 - lastTransitionTime: "2024-08-12T22:28:04Z"
   message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
     failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
     unexpected status code 401 Unauthorized'
   reason: ContainerMissing
   status: "False"
   type: Ready
 - lastTransitionTime: "2024-08-12T22:30:12Z"
   status: "True"
   type: ResourcesAvailable
 containerStatuses:
 - imageDigest: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp@sha256:e915478407c5c882346c4fc72078007fd2511d9e1796345db1873facafddf836
   name: user-container
 desiredReplicas: 1
 observedGeneration: 1

Notice the containerStatuses showing the resolved image digest , the deployments are Ready (with ResourcesAvailable being True), but the ContainerHealthy still being False with the original digest resolution error.

Steps to Reproduce the Problem

Currently does not have a reproducer, noticed the problem on a long running test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ReToCode
Copy link
Member

cc @dprotaso @skonto

@skonto
Copy link
Contributor

skonto commented Sep 5, 2024

@dprotaso gentle ping I tried to reproduce locally but no luck.

@maschmid
Copy link
Contributor Author

maschmid commented Sep 6, 2024

#15487 could be a similar issue.

@skonto
Copy link
Contributor

skonto commented Oct 7, 2024

#15503 fixes this one too, correct @maschmid ?

Copy link

github-actions bot commented Jan 6, 2025

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants