-
Notifications
You must be signed in to change notification settings - Fork 476
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
88b00b6
commit 8033a92
Showing
1 changed file
with
223 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
--- | ||
title: Synchronized Upgrades Between Clusters | ||
authors: | ||
- "@danwinship" | ||
reviewers: | ||
- TBD | ||
approvers: | ||
- TBD | ||
api-approvers: # in case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers) | ||
- TBD | ||
creation-date: 2021-01-11 | ||
last-updated: 2021-01-11 | ||
tracking-link: | ||
- https://issues.redhat.com/browse/SDN-2603 | ||
see-also: | ||
- "/enhancements/network/dpu/overview.md" | ||
--- | ||
|
||
# Synchronized Upgrades Between Clusters | ||
|
||
## Summary | ||
|
||
In a [cluster with DPUs](../network/dpu/overview.md) (eg, BlueField-2 | ||
NICs), the x86 hosts form one OCP cluster, and the DPU ARM systems | ||
form a second OCP cluster. This makes upgrades to new OCP releases | ||
complicated, because there is currently no way to synchronize upgrades | ||
between the two clusters, but rebooting the BF-2 systems as part of | ||
the MCO upgrade will cause a network outage on the x86 systems. In | ||
order for upgrades to work smoothly, we need to synchronize the | ||
reboots between the two clusters, so that the BF-2 systems are only | ||
rebooted when their corresponding x86 hosts have been cordoned and | ||
drained. | ||
|
||
Please refer to the "Glossary" section of the [DPU Overview | ||
Enhancement](../network/dpu/overview.md). | ||
|
||
## Motivation | ||
|
||
### Goals | ||
|
||
- Make upgrades work smoothly in clusters running with DPU support, by | ||
synchronizing the reboots of nodes between the infra cluster and the | ||
tenant cluster. | ||
|
||
### Non-Goals | ||
|
||
- Supporting synchronized upgrades of more than 2 clusters at once. | ||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
As the administrator of a cluster using DPUs, I want to be able to do | ||
z-stream upgrades without causing unnecessary network outages. | ||
|
||
### API Extensions | ||
|
||
TBD | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
TBD | ||
|
||
### Risks and Mitigations | ||
|
||
TBD | ||
|
||
## Design Details | ||
|
||
### Open Questions | ||
|
||
Basically everything... | ||
|
||
The general idea is: | ||
|
||
- We can set some things up at install time (eg, creating credentials | ||
to allow certain operators in the two clusters to talk to each | ||
other). | ||
|
||
- As part of the DPU security model, the tenant cluster cannot have | ||
any power over the infra cluster. (In particular, it can't be | ||
possible for an administrator in the tenant cluster to force the | ||
infra cluster to upgrade/downgrade to any particular version.) Thus, | ||
the upgrade must be initiated on the infra cluster side, and the | ||
infra side will tell the tenant cluster to upgrade as well. | ||
(Alternatively, the upgrade must be simultaneous-ish-ly initiated in | ||
both clusters, if we don't want the infra cluster to have to have a | ||
credential that lets it initiate an upgrade in the tenant cluster.) | ||
|
||
- An upgrade should not be able to start unless both clusters are able | ||
to upgrade. | ||
|
||
- In particular: | ||
|
||
- There can be no `Upgradeable: False` operators in either cluster. | ||
|
||
- The version to upgrade to must be available to both clusters | ||
(ie, it must be available for both x86 and ARM). | ||
|
||
- This could be implemented via some sort of "dpu-cluster-upgrade" | ||
operator running in both clusters, where the two operators | ||
communicate with each other and set their `Upgradeable` state to | ||
reflect the state of the other cluster. If the | ||
"dpu-cluster-upgrade" operator was placed before every other | ||
operator in upgrade priority, then it could also block disallowed | ||
upgrades, by failing its own upgrade, eg if an admin tries to | ||
upgrade one cluster without the other, or tries to upgrade the two | ||
clusters to different versions. | ||
|
||
- (Or should it be possible to do z-stream upgrades of the tenant | ||
cluster without bothering to upgrade the infra cluster too?) | ||
|
||
- The two clusters upgrade all of the operators up to MCO in parallel. | ||
|
||
- Whichever cluster reaches the MCO upgrade first needs to wait for | ||
the other cluster to get there before proceeding. The two MCOs then | ||
need to coordinate to complete the upgrade: First, they have to | ||
agree on what order the physical hosts will be upgraded in. Second, | ||
for each physical host, they have to properly synchronize the | ||
upgrades of its infra node and its tenant node. | ||
|
||
- More specifically, for each physical host, in some order: | ||
|
||
- The Infra MCO will cordon and drain that host's infra node, and | ||
the Tenant MCO will cordon and drain that host's tenant node. | ||
(This can happen in parallel.) | ||
|
||
- The Infra MCO will then upgrade the infra node (causing it to | ||
reboot and temporarily break network connectivity to the tenant | ||
node). | ||
|
||
- Once the infra node upgrade completes, the Tenant MCO will | ||
reboot and upgrade the tenant node. | ||
|
||
- (This seems like it will absolutely require MCO changes.) | ||
|
||
- One way to do this would be to have a CRD with an array of hosts, | ||
indicating the ordering, and the current status of each host, and | ||
the two MCOs could update and watch for updates to monitor each | ||
other's progress. | ||
|
||
|
||
### Test Plan | ||
|
||
TBD | ||
|
||
### Graduation Criteria | ||
|
||
TBD | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
#### Tech Preview -> GA | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
This is a modification to the upgrade process, not something that can | ||
be upgrade or downgraded on its own. | ||
|
||
TBD, as the details depend on the eventual design. | ||
|
||
### Version Skew Strategy | ||
|
||
TBD, as the details depend on the eventual design. | ||
|
||
We will need to deal with skew both within a single cluster, as well | ||
as skew between the infra and tenant clusters. | ||
|
||
### Operational Aspects of API Extensions | ||
|
||
TBD | ||
|
||
The only currently-proposed CRD is for Infra MCO to Tenant MCO | ||
communication, and would not be used by any other components. | ||
|
||
#### Failure Modes | ||
|
||
- The system might get confused and spuriously block upgrades that | ||
should be allowed. | ||
|
||
- Communications failures might lead to upgrades failing without the | ||
tenant cluster being able to figure out why they failed. | ||
|
||
- TBD | ||
|
||
#### Support Procedures | ||
|
||
TBD | ||
|
||
## Implementation History | ||
|
||
- Initial proposal: 2021-01-11 | ||
|
||
## Drawbacks | ||
|
||
This makes the upgrade process more complicated, which risks rendering | ||
clusters un-upgradeable without manual intervention. | ||
|
||
However, without some form of synchronization, it is impossible to | ||
have non-disruptive tenant cluster upgrades. | ||
|
||
## Alternatives | ||
|
||
The fundamental problem is that rebooting the DPU causes a network | ||
outage on the tenant. | ||
|
||
### Never Reboot the DPUs | ||
|
||
This implies never upgrading OCP on the DPUs. I don't see how this | ||
could work. | ||
|
||
### Don't Have an Infra Cluster | ||
|
||
If the DPUs were not all part of a single OCP cluster (for example, | ||
they were just "bare" RHCOS hosts, or they were each running | ||
Single-Node OpenShift), then it might be simpler to synchronize the | ||
DPU upgrades with the tenant upgrades, because then each tenant could | ||
coordinate the actions of its own DPU by itself. | ||
|
||
The big problem with this is that, for security reasons, we don't want | ||
the tenants to have any control over their DPUs. (For some future use | ||
cases, the DPUs will be used to enforce security policies on their | ||
tenants.) |