Currently, moving the control plane of a shoot cluster can only be done manually and requires deep knowledge of how exactly to transfer the resources and state from one seed to another. This can make it slow and prone to errors.
Automatic migration can be very useful in a couple of scenarios:
- Seed goes down and can't be repaired (fast enough or at all) and it's control planes need to be brought to another seed.
- Seed needs to be changed, but this operation requires the recreation of the seed (e.g. turn a single-AZ seed into a multi-AZ seed).
- Seeds need to be rebalanced.
- New seeds become available in a region closer to/in the region of the workers and the control plane should be moved there to improve latency.
- Gardener ring, which is a self-supporting setup/underlay for a highly available (usually cross-region) Gardener deployment.
- Provide a mechanism to migrate the control plane of a shoot cluster from one seed to another.
- The mechanism should support migration from a seed which is no longer reachable (Disaster Recovery).
- The shoot cluster nodes are preserved and continue to run the workload, but will talk to the new control plane after the migration completes.
- Extension controllers implement a mechanism which allows them to store their state or to be restored from an already existing state on a different seed cluster.
- The already existing shoot reconciliation flow is reused for migration with minimum changes.
Source Seed is the seed which currently hosts the control plane of a Shoot Cluster.
Destination Seed is the seed to which the control plane is being migrated.
Note: The following lists are just FYI and are meant to show the current resources which need to be moved to the Destination Seed.
Gardener has preconfigured lists of needed secrets which are generated when a shoot is created and deployed in the seed. The following is a minimum set of secrets which must be migrated to the Destination Seed. Other secrets can be regenerated from them.
- ca
- ca-front-proxy
- static-token
- ca-kubelet
- ca-metrics-server
- etcd-encryption-secret
- kube-aggregator
- kube-apiserver-basic-auth
- kube-apiserver
- service-account-key
- ssh-keypair
The gardenlet deploys custom resources in the Source Seed cluster during shoot reconciliation which are reconciled by extension controllers. The state of these controllers and any additional resources they create is independent of the gardenlet and must also be migrated to the Destination Seed. The following is a list of custom resources and the state which is generated by them that has to be migrated.
- BackupBucket: nothing relevant for migration
- BackupEntry: nothing relevant for migration
- ControlPlane: nothing relevant for migration
- DNSProvider/DNSEntry: nothing relevant for migration
- Extensions: migration of state needs to be handled individually
- Infrastructure: terraform state
- Network: nothing relevant for migration
- OperatingSystemConfig: nothing relevant for migration
- Worker: Machine-Controller-Manager related objects: machineclasses, machinedeployments, machinesets, machines
This list depends on the currently installed extensions and can change in the future.
The Garden cluster has a new Custom Resource which is stored in the project namespace of the Shoot called ShootState
. It contains all the required data described above so that the control plane can be recreated on the Destination Seed.
This data is separated into two sections. The first is generated by the gardenlet and then either used to generate new resources (e.g secrets), or is directly deployed to the Shoot's control plane on the Destination Seed.
The second is generated by the extension controllers in the seed.
apiVersion: core.gardener.cloud/v1alpha1
kind: ShootState
metadata:
name: my-shoot
namespace: garden-core
ownerReference:
apiVersion: core.gardener.cloud/v1beta1
blockOwnerDeletion: true
controller: true
kind: Shoot
name: my-shoot
uid: ...
finalizers:
- gardener
gardenlet:
secrets:
- name: ca
data:
ca.crt: ...
ca.key: ...
- name: ssh-keypair
data:
id_rsa: ...
- name:
...
extensions:
- kind: Infrastructure
state: ... (Terraform state)
- kind: ControlPlane
purpose: normal
state: ... (Certificates generated by the extension)
- kind: Worker
state: ... (Machine objects)
The state data is saved as a runtime.RawExtension
type, which can be encoded/decoded by the corresponding extension controller.
There can be sensitive data in the ShootState
which has to be hidden from the end-users. Hence, it will be recommended to provide an etcd encryption configuration to the Gardener API server in order to encrypt the ShootState
resource.
There are limits on the size of the request bodies sent to the kubernetes API server when creating or updating resources:
- by default, etcd can only accept request bodies which do not exceed 1.5 MiB (this can be configured with the
--max-request-bytes
flag). - the Kubernetes API Server has a request body limit of 3 MiB, which cannot be set from the outside (with a command line flag).
- the gRPC configuration used by the API server to talk to etcd has a limit of 2 MiB per request body which cannot be configured from the outside.
watch
requests have a 16 MiB limit on the buffer used to stream resources.
This means that if ShootState
is bigger than 1.5 MiB, the etcd max request bytes will have to be increased. However, there is still an upper limit of 2 MiB imposed by the gRPC configuration.
If ShootState
exceeds this size limitation, it must make use of configmap/secret references to store the state of extension controllers. This is an implementation detail of Gardener and can be done at a later time if necessary, as extensions will not be affected.
Splitting the ShootState
into multiple resources could have a positive benefit on performance as the Gardener API Server and Gardener Controller Manager would handle multiple small resources instead of one big resource.
All extension controllers which require state migration must save their state in a new status.state
field and act on an annotation gardener.cloud/operation=restore
in the respective Custom Resources which should trigger a restoration operation instead of reconciliation. A restoration operation means that the extension has to restore its state in the Shoot's namespace on the Destination Seed from the status.state
field.
As an example: the Infrastructure
resource must save the terraform state.
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
type: azure
region: eu-west-1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
resourceGroup:
name: mygroup
networks:
vnet: # specify either 'name' or 'cidr'
# name: my-vnet
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
status:
state: |
{
"version": 3,
"terraform_version": "0.11.14",
"serial": 2,
"lineage": "3a1e2faa-e7b6-f5f0-5043-368dd8ea6c10",
"modules": [
{
}
]
...
}
Extensions which do not require state migration should set status.state=nil
in their Custom Resources and trigger a normal reconciliation operation if the CR contains the core.gardener.cloud/operation=restore
annotation.
Similar to the contract for the reconcile operation, the extension controller has to remove the restore
annotation after the restoration operation has finished.
An additional annotation gardener.cloud/operation=migrate
is added to the Custom Resources. It is used to tell the extension controllers in the Source Seed that they must stop reconciling resources (in case they are requeued due to errors) and should perform cleanup activities in the Shoot's control plane. These cleanup activities involve removing the finalizers on Custom Resources and deleting them without actually deleting any infrastructure resources.
Note: The same size limitations from the previous section are relevant here as well.
The only data which must be stored in the ShootState
by the gardenlet is secrets (e.g ca for the API server). Therefore, the botanist.DeploySecrets
step is changed. It is split into two functions which take a list of secrets that have to be generated:
botanist.GenerateSecretState
generates certificate authorities and other secrets which have to be persisted in the ShootState and must not be regenerated on the Destination Seed.botanist.DeploySecrets
takes secret data from theShootState
, generates new ones (e.g. client tls certificates from the saved certificate authorities), and deploys everything in the Shoot's control plane on the Destination Seed.
The ShootState synchronization controller will become part of the gardenlet. It syncs the state of extension custom resources from the shoot namespace to the garden cluster and updates the corresponding spec.extension.state
field in the ShootState
resource. The controller can watch
Custom Resources used by the extensions and update the ShootState
only when changes occur.
- Starting migration:
- Migration can only be started after a Shoot cluster has been successfully created so that the
status.seed
field in theShoot
resource has been set. - The
Shoot
resource's fieldspec.seedName="new-seed"
is edited to hold the name of the Destination Seed and reconciliation is automatically triggered. - The Garden Controller Manager checks the equality between
spec.seedName
andstatus.seed
, detects that they are different, and triggers migration.
- Migration can only be started after a Shoot cluster has been successfully created so that the
- The Garden Controller Manager waits for the Destination Seed to be ready.
- Shoot's API server is stopped.
- Backup the Shoot's etcd.
- Extension resources in the Source Seed are annotated with
gardener.cloud/operation=migrate
. - Scale Down the Shoot's control plane in the Source Seed.
- The gardenlet in the Destination Seed fetches the state of extension resources from the
ShootState
resource in the garden cluster. - Normal reconciliation flow is resumed in the Destination Seed. Extension resources are annotated with
gardener.cloud/operation=restore
to instruct the extension controllers to reconstruct their state. - The Shoot's namespace in Source Seed is deleted.