-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(oidc): add considerations for impacted kube-apiserver admission plugins #1726
base: master
Are you sure you want to change the base?
(oidc): add considerations for impacted kube-apiserver admission plugins #1726
Conversation
Skipping CI for Draft Pull Request. |
Continuing with the notion of keeping users informed of impacts, the `Authentication` resource that users must update to enable the OIDC authentication mode on the cluster will be extended with a new status field to inform users of any potential impacts. In the event there are existing `RoleBindingRestriction` resources on the cluster that specify user/group restrictions, this new status field will be populated with a message stating the potential impact. | ||
|
||
```go | ||
type AuthenticationStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea to be discussed: if we decide to update the API, we could come up with a more generic status field; we could use that as well to embed information about OIDC rollout in the KAS pods instead of manually checking. This could follow the general Condition pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this section in f5c20de to use the condition pattern, but it is currently focused on this particular use case. I figure we can discuss and iterate more after further discussion.
##### Changes to the kube-apiserver | ||
|
||
To account for the impacts to the `authorization.openshift.io/RestrictSubjectBindings` admission plugin, the OpenShift-specific patch to the kube-apiserver that adds this admission plugin will be updated such that: | ||
|
||
- Informers for the `Group` API are not started if the `Authentication` resource `.Spec.Type` is set to `OIDC` | ||
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.Spec.Type` is set to `OIDC` | ||
- `RoleBinding`s will be rejected if there exists a `RoleBindingRestriction` that specifies user and/or group restrictions | ||
- It is considered a failure if we are unable to determine the authentication type for the cluster, leading to rejection of the `RoleBinding` | ||
|
||
##### Changes to openshift-apiserver | ||
|
||
To help keep users informed of the expected behavior of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin when using the OIDC cluster authentication mode, it is proposed that a new admission plugin is added to the openshift-apiserver to reject creation of `RoleBindingRestriction` resources containing user/group restrictions. | ||
|
||
**Alternatives** | ||
|
||
- Do not reject admission, but issue a warning of the impacts creating a `RoleBindingRestriction` may have when using OIDC as the cluster authentication method. | ||
- Use a `ValidatingAdmissionPolicy` + `ValidatingAdmissionPolicyBinding` instead of an admission plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking whether auth type is OIDC, would it be more precise/suitable to check whether the required API groups for the required plugin functionality exist, and decouple this from OIDC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thinking behind coupling it with OIDC is knowing specifically if the API groups not existing is intentional or not.
I think only checking for the existence doesn't give us a clear enough picture if there is a larger issue at hand or if it is intentional.
Maybe a middle ground is to check if there is any evidence of the oauth-apiserver workload being present on the cluster? If it is and the APIs are unavailable, something might be wrong. If it is not and the APIs are unavailable, it is likely intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub says this comment is outdated now that I've changed some stuff around, but I think the question posed is still relevant and worth further discussion.
|
||
- Informers for the `Group` API are not started if the `Authentication` resource `.Spec.Type` is set to `OIDC` | ||
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.Spec.Type` is set to `OIDC` | ||
- `RoleBinding`s will be rejected if there exists a `RoleBindingRestriction` that specifies user and/or group restrictions in the namespace the `RoleBinding` is being created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this further -- the RBRs have been built with OAuth in mind, but AFAIU RoleBindings should still generally work with OIDC users/groups.
// +listType=map | ||
// +listMapKey=type | ||
// +openshift:enable:FeatureGate=ExternalOIDC | ||
OIDCConditions []metav1.Condition `json:"oidcConditions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an idea of how we could structure this to make it usable for more generic tracking of the OIDC config/rollout progress:
status:
oidcConfig:
conditions:
- message: ""
reason: ""
status: "True" # use this to indicate full KAS rollout on OIDC and start cleanup
type: "Available"
- message: ""
reason: ""
status: "False" # use this to indicate rollout in progress (new & updated config)
type: "Progressing"
- message: ""
reason: ""
status: "True" # use this to indicate CAO/KAS-o configuration issues
type: "Degraded"
# just an example structure about RBRs -- a separate condition type would allow to split meaning from the standard Available/Progressing/Degraded
- message: "existing RoleBindingRestrictions on users/group not supported in external OIDC: ns1/rbr1, ns2/rbr2"
reason: "UnsupportedRoleBindingRestrictions"
status: "True"
type: "UnsupportedResourceDetected"
It would be useful to separate this into status.oidcConfig.conditions
because we might want to track other fields under status.oidcConfig
(similar to status.oidcClients
).
Finally, if we decide to block enablement if there are RBRs on Users/Groups, instead of a dedicated condition type we would use Degraded
.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- Informer for the `Group` API are only configured and started as part of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled. | ||
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.spec.type` is set to `OIDC`. This will prevent logs in the kube-apiserver associated with not being able to connect to the oauth-apiserver, which we know is not running when OIDC is enabled. | ||
|
||
**Open Question**: Does the disabling of an admission plugin through `--disable-admission-plugins` mean that the plugin will not be initialized? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing some research on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, disabling the admission plugin does not mean that the plugin will not be initialized.
This means that for the authorization.openshift.io/RestrictSubjectBindings
admission plugin, we will likely need to defer the setup of the informers beyond initialization of the plugin.
Signed-off-by: Bryce Palmer <[email protected]>
ba8e816
to
8246eb8
Compare
|
||
##### Changes to the kube-apiserver | ||
|
||
The OpenShift-specific patch to the kube-apiserver that adds this admission plugin is found here: https://github.com/openshift/kubernetes/blob/master/openshift-kube-apiserver/openshiftkubeapiserver/patch.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link to a specific ref instead of master
; this might result in a broken link if for some reason the file ever gets moved.
|
||
- Disable the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins | ||
- Remove the `rolebindingrestrictions.authorization.openshift.io` CustomResourceDefinition | ||
- The `Authentication` api to communicate when OIDC can't be enabled due to existing `RoleBindingRestriction` resources through a new conditions field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when OIDC can't be enabled due to existing
RoleBindingRestriction
resources
This bit should stand out more; also
through a new conditions field
as per another comment below, the new API field should be introduced in a separate section as it is more generic
Maybe something along the lines of:
OIDC won't be enabled while
RoleBindingRestriction
resources exist; this will be communicated in the newAuthentication
API OIDC status field.
|
||
In order to prevent misleading logs about informers that failed to start or failure to connect to the oauth-apiserver, the following changes to this patch are to be made: | ||
|
||
- Informer for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
- Informer for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled. | |
- Informers for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled. |
This will be done through updates to the appropriate config observers to update the `KubeAPIServerConfig.apiServerArguments` map to: | ||
|
||
- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument | ||
- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity: AFAIU normally it would be enough to remove the plugins from the --enable-admission-plugins
arg, as they are not default plugins that need explicit disabling. However, the config observer doesn't have access to the final config object and therefore the --enable-admission-plugins
field, therefore we'll use the --disable-admission-plugins
to indicate what needs disabling. We'll also need a special merge so that it gets removed from enabled and added to disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC from my experimenting, overriding the --enable-admission-plugins
in the config to no longer include these admission plugins did not sufficiently disable them and is why I specifically call out adding them to the --disable-admission-plugins
flag.
I'm not sure we need to into the exact semantics of how this achieved, but if we do I'm happy to do a bit more digging and figuring out what changes may need to be made to the config logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, no need to go into more detail here; I just added this note as a result of some digging I did, as a note to ourselves.
- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument | ||
- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument | ||
|
||
##### Changes to the `Authentication` API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we move this to a section independent of the RoleBindingRestrictions issue at hand. This API will be used to communicate status/progress of the OIDC configuration, which includes any issue with RBRs.
|
||
This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD. | ||
|
||
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC. | |
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been rolled out. |
If we remove the CRD the moment the auth type becomes OIDC
, we won't give time to the admins to react in case any RBRs exist, as the CRD will be removed immediately (and therefore any existing resources). I believe we'll want this in two steps: CAO complains if RBRs exist, and doesn't proceed with OIDC rollout. Once they are deleted, OIDC rollout proceeds. Once it is completed and OIDC is available (we'll use the new API field for that), OAuth cleanup starts, which includes deleting the CRD.
For the moment, this is the condition used to determine when OIDC has been enabled: https://github.com/openshift/cluster-authentication-operator/pull/740/files#diff-51c6cd196c758006bbe84eed012e6baac4713a856a96b7dfd10adc8ad7986e48R20
When we'll have the new API though, we'll use that to determine that it's available (i.e. Available=True
). The KAS-o config observer will make sure to update the status accordingly when it detects that the KAS pods have been rolled out with OIDC.
|
||
As the cluster-authentication-operator will now be responsible for the `rolebindingrestrictions.authorization.openshift.io` CRD, it should no longer be added to the openshift/api payload manifests that are included in a payload image and get managed by CVO. | ||
|
||
This will likely mean removing the associated files from the hack/update-payload-crd.sh script here: https://github.com/openshift/api/blob/dd0f68969241c0548906ec98c12bb208512cbbb4/hack/update-payload-crds.sh#L6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some information about how the rollout of this change will become effective in an upgrade (handover will be smooth, but we should write this down).
|
||
The OIDC authentication mode on the cluster will not be allowed to be enabled if any `RoleBindingRestriction` resources exist. | ||
|
||
To communicate the reason for the enablement of the OIDC functionality being blocked, the `Authentication` API will be extended with a new status field to communicate the condition of the OIDC feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss further how we'll communicate this; for example, we can set Available=False/Degraded=True
when RBRs exist. We'll need to also take care of some corner cases, e.g. what if someone creates RBRs after the CAO has started the rollout, but before the KAS pods have restarted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to discussing further how we communicate this. I'll go into a bit more detail on this and then we can refine it from there.
For the corner case where a RBR is created after the CAO has already started the rollout process but before the KAS pods have restarted, my expectation is that we remove the CRD, which in turn deletes the CRs (in this case the newly created RBRs). We can discuss this a bit further if we think that this is an unacceptable user experience, but I think this would be OK for now. We could add warnings in the OpenShift documentation for enabling OIDC that any RBRs created during the rollout of the OIDC functionality will be automatically removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my expectation is that we remove the CRD, which in turn deletes the CRs
I also think this sounds good enough for now 👍
OIDCConfig *OIDCConfig `json:"oidcConfig,omitempty"` | ||
} | ||
|
||
type OIDCConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it explicit that this type is used to communicate status info.
type OIDCConfig struct { | |
type OIDCConfigStatus struct { |
Signed-off-by: Bryce Palmer <[email protected]>
Signed-off-by: Bryce Palmer <[email protected]>
@everettraven: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.