Skip to content

Commit

Permalink
Merge pull request #1483 from jhernand/dont_require_registry_during_b…
Browse files Browse the repository at this point in the history
…oot_and_upgrade

Don't require registry during reboot and upgrade
  • Loading branch information
openshift-ci[bot] authored Oct 20, 2023
2 parents 7393145 + 49b0079 commit 955c042
Showing 1 changed file with 198 additions and 0 deletions.
198 changes: 198 additions & 0 deletions enhancements/update/dont-require-registry-during-reboot-and-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
title: dont-require-registry-during-reboot-and-upgrade
authors:
- "@jhernand"
reviewers:
- "@avishayt" # To ensure that this will be usable with the appliance.
- "@danielerez" # To ensure that this will be usable with the appliance.
- "@mrunalp" # To ensure that this can be implemented with CRI-O and MCO.
- "@nmagnezi" # To ensure that this will be usable with the appliance.
- "@oourfali" # To ensure that this will be usable with the appliance.
approvers:
- "@sdodson"
- "@zaneb"
- "@LalatenduMohanty"
api-approvers:
- "@sdodson"
- "@zaneb"
- "@deads2k"
- "@JoelSpeed"
creation-date: 2023-09-21
last-updated: 2023-09-21
tracking-link:
- https://issues.redhat.com/browse/RFE-4482
see-also:
- https://issues.redhat.com/browse/OCPBUGS-13219
- https://github.com/openshift/enhancements/pull/1481
- https://github.com/openshift/cluster-network-operator/pull/1803
replaces: []
superseded-by: []
---

# Don't require registry during reboot and upgrade

## Summary

Ensure that clusters don't require a registry server to reboot existing nodes or
upgrade when all the required images have already been pulled.

## Motivation

Currently during reboots of existing nodes and upgrades clusters may need to
contact the image registry servers, even if the images have already been pulled.
This complicates things for clusters that are completely disconnected or that
have an slow or unreliable connection to the image registry servers.

### User Stories

#### Reboot without registry

As the administrator of a cluster that has all the required images already
pulled in all the nodes, I want to be able to reboot nodes it without requiring
access to a registry server.

### Upgrade without registry

As the administrator of a cluster that has all the required images already
pulled in all the nodes, I want to be able to upgrade it without requiring
access to a registry server.

### Goals

Ensure that clusters don't require a registry server to reboot nodes or upgrade
when all the required images have already been pulled.

### Non-Goals

It is not a goal of this enhancement to add a mechanism to ensure that
required images are available. That is the subject of the [pin and pre-load
images](https://github.com/openshift/enhancements/pull/1481) enhancement.

It is not a goal of this enhancement to eliminate the requirement of a registry
server for other workloads. We are just eliminating the requirement that it be
available during reboots of existing nodes and upgrades.

## Proposal

### Workflow Description

1. The administrator of a cluster reboots a node or performs an upgrade.

1. All the components of the cluster ensure that if the required images have
been already pulled they will not try to contact the registry server.

### API Extensions

None.

### Implementation Details/Notes/Constraints

#### Don't use the `Always` pull policy

Some OCP components currently use the `Always` image pull policy during
upgrades. As a result, the kubelet and CRI-O will try to contact the registry
server, even if the image is already available in the local storage of the
cluster. This blocks upgrades and should be avoided.

Most of these OCP components have been changed in the past to avoid this use of
the `Always` pull policy. Recently the OVN pre-puller has also been changed (see
this [bug](https://issues.redhat.com/browse/OCPBUGS-13219) for details). To
prevent bugs like this happening in the future and make the reboots of nodes and
upgrades less fragile we should have a test that gates the OpenShift release and
that verifies that reboots of nodes and upgrades can be performed without a
registry server. One way to ensure this is to run in CI an admission hook that
rejects/warns about any spec that uses the `Always` pull policy.

#### Don't try to contact the image registry server explicitly

Some OCP components explicitly try to contact the registry server without a
fallback alternative. These need to be changed so that they don't do it or so
that they have a fallback mechanism when the registry server isn't available.

For example, in OpenShift 4.1.13 the machine config operator runs the
equivalent of `skopeo inspect` in order to decide what kind of upgrade is in
progress. That fails if there is no registry server, even if the release image
has already been pulled. That needs to be changed so that contacting the
registry server is not required. A possible way to do that is to use the
equivalent of `crictl inspect` instead.

### Risks and Mitigations

None.

### Drawbacks

None.

## Design Details

### Open Questions

Access to a registry server is also needed by the image stream controller to
import tags. Those imports will fail if there is no registry available. Does
that block upgrades?

### Test Plan

We should have a set of CI tests that verify that reboots of nodes and upgrades
can be performed in a fully disconnected environment without a registry server,
both for a single node cluster and a cluster with multiple nodes. These tests
should gate the OCP release.

### Graduation Criteria

The feature will ideally be introduced as `Dev Preview` in OpenShift 4.X,
moved to `Tech Preview` in 4.X+1 and declared `GA` in 4.X+2.

#### Dev Preview -> Tech Preview

- Ability to reboot nodes in disconnected environments without a registry
server.

- Ability to upgrade clusters in disconnected environments without a registry
server.

- Availability of the tests that verify the reboots of nodes and upgrade without
a registry server.

- Availability of the tests that verify that no OCP component uses the `Always`
pull policy.

- Obtain positive feedback from at least one customer.

#### Tech Preview -> GA

- User facing documentation created in
[https://github.com/openshift/openshift-docs](openshift-docs).

#### Removing a deprecated feature

Not applicable.

### Upgrade / Downgrade Strategy

Not applicable.

### Version Skew Strategy

Not applicable.

### Operational Aspects of API Extensions

Not applicable.

#### Failure Modes

#### Support Procedures

## Implementation History

None.

## Alternatives

None.

## Infrastructure Needed

Infrastructure will be needed to run the tests described in the test plan above.

0 comments on commit 955c042

Please sign in to comment.