-
Notifications
You must be signed in to change notification settings - Fork 476
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add enhancement proposal for SDN live migration
Signed-off-by: Peng Liu <[email protected]>
- Loading branch information
Showing
1 changed file
with
358 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,358 @@ | ||
--- | ||
title: SDN Live Migration | ||
authors: | ||
- "@pliurh" | ||
reviewers: | ||
- "@danwinship" | ||
- "@trozet" | ||
- "@dcbw" | ||
approvers: | ||
- | ||
creation-date: 2022-03-04 | ||
last-updated: 2022-03-04 | ||
status: implementable | ||
--- | ||
|
||
# SDN Live Migration | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
|
||
## Summary | ||
Migrating the CNI network provider network of a running cluster from | ||
OpenShift SDN to OVN Kubernetes without service interruption. During the | ||
migration, we will partition the cluster into two sets of nodes controlled by | ||
different network plugins. We will utilize the Hybrid overlay feature of | ||
OVN Kubernetes to connect the networks of the two CNI network plugins. So that | ||
pods on each side can still talk to the pods on the other side. | ||
|
||
## Motivation | ||
|
||
For some Openshift users, they have very high requirement on service | ||
availability. The current SDN migration solution, that will cause a service | ||
interruption, is not acceptable. | ||
|
||
### Goals | ||
|
||
- Migrate the cluster network provider from OpenShift SDN to OVN Kubernetes for a | ||
existing cluster. | ||
- This's in-place migration without requiring extra nodes. | ||
- The impact to workload of the migration shall be similar as an OCP upgrade. | ||
- The solution can work in scale, e.g. in a large cluster with hundreds of | ||
nodes. | ||
- The migration operation shall be able to be rolled back if needed. | ||
|
||
### Non-Goals | ||
|
||
- Support for migration to other network providers | ||
- The necessary GUI change in Openshift Cluster Manager. | ||
- The live migration of the egress IP, router and firewall configuration. | ||
- Migration from SDN Multitenant mode | ||
|
||
## Pre-requisites | ||
|
||
- This solution relies on the ovnkube hybrid overlay feature. This bug | ||
https://bugzilla.redhat.com/show_bug.cgi?id=2040779 needs to be fixed. | ||
|
||
## Proposal | ||
|
||
The key problem of do a live SDN migration is that, we need to the connectivity | ||
of the cluster network during the migration, when pods are attached to different | ||
networks. We propose to utilize the OVN Kubernetes hybrid overlay feature to | ||
connect the networks owned by OpenShift SDN and OVN Kubernetes. | ||
|
||
- We will run different plugins on different nodes, but both plugins will know | ||
how to reach pods owned by the other plugin, so all pods/services/etc remain | ||
connected. | ||
- During migration, CNO will take original-plugin nodes one by one and convert | ||
them to destination-plugin nodes, rebooting them in the process. | ||
- The cluster network CIDR will not change. In fact, none of the node host | ||
subnets will change. | ||
- NetworkPolicy will work correctly throughout the migration. | ||
|
||
### Limitations | ||
|
||
- Multicast may not work during the migration when the CNI plugins are running | ||
in parallel, but it shall work after the migration is complete. | ||
- The following features which are only supported by OpenShift SDN but not by | ||
OVN Kubernetes, will stop working when the migration is started. | ||
- Multitenant Isolation | ||
- The following features which are supported by both OpenShift SDN and OVN | ||
Kubernetes but with different design and implementation. Therefore even with | ||
the same name, they have different API and configure logic between OpenShift | ||
SDN and OVN Kubernetes. Also, the supported platforms and modes are different | ||
between the two network providers. So users need to evaluate before conducting | ||
the live migration, and have to migrate the configuration manually after the | ||
migration is complete. | ||
- Egress IP | ||
- Egress Firewall | ||
- Egress Router | ||
|
||
### User Stories | ||
|
||
The service delivery (SD) team (managed OpenShift services ARO, OSD, ROSA) has a | ||
unique set of requirements around downtime, node reboots, and high degree of | ||
automation. Specifically, SD need a way to migrate their managed fleet in a way | ||
that is no more impactful to the customer's workloads than an OCP upgrade, and | ||
that can be done at scale, in a safe, automated way that can be made | ||
self-service and not require SD to negotiate maintenance windows with customers. | ||
The current migration solution needs to be revisited to support these | ||
(relatively) more stringent requirements. | ||
|
||
### Risks and Mitigations | ||
|
||
## Design Details | ||
|
||
The existing ovn-kubernetes hybrid overlay feature was developed for hybrid | ||
Windows/Linux clusters. Each ovn-kubernetes node manages an external-to-OVN OVS | ||
bridge, named br-ext, which acts as the VXLAN source and endpoint for packets | ||
moving between pods on the node and their cluster-external destination. The | ||
br-ext SDN switch acts as a transparent gateway and routes traffic towards | ||
windows nodes. | ||
|
||
In the SDN live migration use case, we can enhance this feature to connects the | ||
nodes managed by different CNI plugins. To minimize the implementation effort, | ||
and the code maintainability, we will try to reused the whole hybrid overlay and | ||
only make necessary changes to both CNI plugins. | ||
|
||
On the OVN Kubernetes side, all the cross CNI traffic shall follow the same path | ||
of current hybrid overlay implementation. For OVN Kubernetes, we need to do | ||
following enhancements: | ||
|
||
1. We need to modify OVN-K to allow overlapping between the cluster network and | ||
the Hybrid overlay CIDR. So that we can reuse the cluster network in the | ||
migration. | ||
2. We need to modify OVN-K to allow modifying the hybrid overlay CIDR on the | ||
fly. | ||
3. We need to allow `hybrid-overlay-node` running on linux nodes, currently it | ||
is designed to only run on windows nodes. | ||
|
||
On the OpenShift SDN side, when a node is converted to OVN Kubernetes, | ||
it shall be almost transparent to the control-plane. But we need to introduce a | ||
'migration mode' for OpenShift SDN by: | ||
|
||
1. Change ingress NetworkPolicy processing to be based entirely on pod IPs | ||
rather than using namespace VNIDs, since packets from ovn nodes will have | ||
the VNID 0 set. | ||
2. To be compatible with Windows node VXLAN implementation, OVN-K hybrid overlay | ||
uses the host interface MAC as VXLAN inner MAC. When packets arrive at the | ||
br0 of the SDN node, they cannot be forwarded to the pod interface, due to | ||
the MAC mismatch. We need to add flows for each pod which swaps the dst MAC | ||
to the pod interface mac. | ||
|
||
### The Traffic Path | ||
|
||
#### Packets going from OpenShift SDN to OVN Kubernetes | ||
|
||
On the SDN side, it doesn't need to know if the peer node is a SDN node or a OVN | ||
node. We reuse the existing VXLAN tunnel rules one the SDN side. | ||
- Egress NetworkPolicy rules and service proxying happen as normal | ||
- When the packet reaches table 90, it will hit a “send via vxlan” rule that was | ||
generated based on a HostSubnet object. | ||
|
||
On the OVN side: | ||
- OVN accepts the packet via the VXLAN tunnel, ignore the VNID set by SDN and | ||
then just routes it normally. | ||
- Ingress NetworkPolicy processing will happen when the packet reaches the | ||
destination pod’s switch port, just like normal. | ||
- Our NetworkPolicy rules are all based on IP addresses, not “logical input | ||
port”, etc, so it doesn’t matter that the packets came from outside OVN and | ||
have no useful OVN metadata. | ||
|
||
#### Packets going from OVN Kubernetes to OpenShift SDN | ||
|
||
On the ovn side: | ||
- The packet just follow the same path of hybrid overlay. The packet just has to | ||
get routed out the VXLAN tunnel with VNID 0. | ||
|
||
On the sdn side: | ||
- We have to change ingress NetworkPolicy processing to be based entirely on pod | ||
IPs rather than using namespace VNIDs, since packets from ovn nodes won’t have | ||
the VNID set. There is already code to generate the rules that way though, | ||
because egress NetworkPolicy already works that way. | ||
|
||
### The Migration Process | ||
|
||
#### Migration Setup | ||
|
||
1. The admin kicks off the migration process by updating the custom resource | ||
`network.operator`: | ||
|
||
```bash | ||
$ oc patch Network.operator.openshift.io cluster --type='merge' --patch '{"spec":{"migration":{"networkType":"OVNKubernetes","type": "live"}}}' | ||
``` | ||
|
||
CNO will check if there is any unsupported feature (refer the | ||
[Limitations](#limitations) section) enabled in the cluster. If yes, it will | ||
set an error message under the `status.migration` field of the custom | ||
resource `network.operator`, and will not proceed until users disable those | ||
features. | ||
|
||
2. CNO will redeploy the openshift-sdn DaemonSets, enabling __migration mode__. | ||
And add the logic to check if the bridge `br-ex` exists on the node the to | ||
the wrapper script of the sdn DaemonSet. If `br-ex` exists, it means the node | ||
has already been updated by MCO, it would just do "sleep infinity" rather | ||
than actually launching the `openshift-sdn-node` process. | ||
|
||
3. CNO will also deploy the ovnkube-master and ovnkube-node DaemonSets to all | ||
the nodes, also enabling hybrid overlay mode in their config. The | ||
ovnkube-node DaemonSet wrapper script also has the logic to check if the | ||
bridge br-ex exists on the node. If `br-ex` doesn't exist, it means the node | ||
has yet been updated by MCO, it would run `hybrid-overlay-node` process that | ||
annotate node with necessary hybrid-overlay information, e.g.: | ||
|
||
```yaml | ||
k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-c2-f5-92-28-ad | ||
k8s.ovn.org/hybrid-overlay-node-subnet: 192.168.111.20/24 | ||
``` | ||
otherwise it would launch the `ovnkube-node` process. | ||
|
||
#### Migration | ||
|
||
We can kick off the migration from CNO, by: | ||
|
||
```bash | ||
$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"spec":{"networkType":"OVNKubernetes"}' | ||
``` | ||
|
||
CNO will update the `status.networkType` field of the `network.config` CR. It | ||
will trigger MCO to apply new MachineConfig to each node: | ||
|
||
1. MCO will | ||
- rerender the MachineConfig for each node | ||
- try to cordon and drain the nodes based on | ||
`MachineConfigPools.spec.maxUnavailable` | ||
2. CNO will be able to find a node is being drained by MCO by watching the MCO | ||
node annotations, it will clear the `k8s.ovn.org/hybrid-overlay-xxx` node | ||
annotation then set the annotation `k8s.ovn.org/node-subnets:` according to | ||
the HostSubnet object of the node to bypass the ovnkube node subnet | ||
allocation. | ||
3. CNO will also update the hybrid overlay configuration and the onvkube-master | ||
pods will take the updated hybrid overlay configuration then update the OVN | ||
database. | ||
4. MCO will drain and reboot the node. | ||
5. After booting up, br-ex will be created and ovnkube-node will run in hybrid | ||
overlay mode on the node. | ||
6. The wrapper script of ovnkube-node DaemonSet will see br-ex, then launch the | ||
`ovnkube-node` process. | ||
7. MCO will uncordon the node. Pods will be created on the node using | ||
ovn-kubernetes as the default CNI plugin. | ||
|
||
The above process will be repeated for each node, until all the nodes have been | ||
applied to the new MachineConfig and converted to OVN Kubernetes. | ||
|
||
#### Migration Cleanup | ||
|
||
Once migration is complete, CNO will: | ||
- delete the openshift-sdn DaemonSets | ||
- redeploy ovn-kubernetes in “normal” mode (no migration mode config, no node | ||
affinity). | ||
- remove the migration-related labels from the nodes | ||
|
||
### API | ||
|
||
To start the migration, users need to update the `network.operator` | ||
CR by adding: | ||
|
||
```json | ||
{ | ||
"spec": { | ||
"migration": { | ||
"networkType": "OVNKubernetes", | ||
"type": "live" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
On removing of the `spec.migration` field, CNO will start the migration cleanup. | ||
|
||
CNO will also report the migration state to under the `status` of | ||
`network.operator` CR. | ||
|
||
```json | ||
{ | ||
"status": { | ||
"migration": { | ||
// The state can be 'Setup', 'Working', 'Done' or 'Error', | ||
"state": "Working", | ||
// the reason needs to be filled when state is 'Error'. | ||
"reason": "" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Rollback | ||
|
||
Users shall be able to rollback to openshift-sdn after the migration is | ||
complete. The migration is bidirectional, thus users can follow the similar | ||
above-mentioned procedure to conduct the rollback. | ||
|
||
### Lifecycle Management | ||
|
||
This is a one-time operation for a cluster, therefore no lifecycle management. | ||
|
||
### Test Plan | ||
|
||
TBD | ||
|
||
### Graduation Criteria | ||
|
||
Graduation criteria follows: | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
- End user documentation, relative API stability | ||
- Sufficient test coverage | ||
- Gather feedback from users rather than just developers | ||
|
||
#### Tech Preview -> GA | ||
|
||
- More testing (upgrade, scale) | ||
- Add gating CI jobs on the relevant GitHub repos | ||
- Sufficient time for feedback | ||
|
||
#### Removing a deprecated feature | ||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
This is a one-time operation for a cluster, therefore no upgrade / downgrade | ||
strategy. | ||
|
||
|
||
### Version Skew Strategy | ||
N/A | ||
|
||
### API Extensions | ||
N/A | ||
|
||
### Operational Aspects of API Extensions | ||
N/A | ||
|
||
#### Failure Modes | ||
N/A | ||
|
||
#### Support Procedures | ||
N/A | ||
|
||
## Implementation History | ||
N/A | ||
|
||
## Drawbacks | ||
N/A | ||
|
||
## Alternatives | ||
|
||
Instead of switch the network provider for a existing cluster, we can spin up a | ||
new cluster then move the workload to it. | ||
|
||
## Infrastructure Needed | ||
|