Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(oidc): add considerations for impacted kube-apiserver admission plugins #1726

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

everettraven
Copy link

@everettraven everettraven commented Dec 9, 2024

No description provided.

Copy link
Contributor

openshift-ci bot commented Dec 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 9, 2024
Continuing with the notion of keeping users informed of impacts, the `Authentication` resource that users must update to enable the OIDC authentication mode on the cluster will be extended with a new status field to inform users of any potential impacts. In the event there are existing `RoleBindingRestriction` resources on the cluster that specify user/group restrictions, this new status field will be populated with a message stating the potential impact.

```go
type AuthenticationStatus struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea to be discussed: if we decide to update the API, we could come up with a more generic status field; we could use that as well to embed information about OIDC rollout in the KAS pods instead of manually checking. This could follow the general Condition pattern.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this section in f5c20de to use the condition pattern, but it is currently focused on this particular use case. I figure we can discuss and iterate more after further discussion.

Comment on lines 225 to 241
##### Changes to the kube-apiserver

To account for the impacts to the `authorization.openshift.io/RestrictSubjectBindings` admission plugin, the OpenShift-specific patch to the kube-apiserver that adds this admission plugin will be updated such that:

- Informers for the `Group` API are not started if the `Authentication` resource `.Spec.Type` is set to `OIDC`
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.Spec.Type` is set to `OIDC`
- `RoleBinding`s will be rejected if there exists a `RoleBindingRestriction` that specifies user and/or group restrictions
- It is considered a failure if we are unable to determine the authentication type for the cluster, leading to rejection of the `RoleBinding`

##### Changes to openshift-apiserver

To help keep users informed of the expected behavior of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin when using the OIDC cluster authentication mode, it is proposed that a new admission plugin is added to the openshift-apiserver to reject creation of `RoleBindingRestriction` resources containing user/group restrictions.

**Alternatives**

- Do not reject admission, but issue a warning of the impacts creating a `RoleBindingRestriction` may have when using OIDC as the cluster authentication method.
- Use a `ValidatingAdmissionPolicy` + `ValidatingAdmissionPolicyBinding` instead of an admission plugin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking whether auth type is OIDC, would it be more precise/suitable to check whether the required API groups for the required plugin functionality exist, and decouple this from OIDC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking behind coupling it with OIDC is knowing specifically if the API groups not existing is intentional or not.

I think only checking for the existence doesn't give us a clear enough picture if there is a larger issue at hand or if it is intentional.

Maybe a middle ground is to check if there is any evidence of the oauth-apiserver workload being present on the cluster? If it is and the APIs are unavailable, something might be wrong. If it is not and the APIs are unavailable, it is likely intentional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub says this comment is outdated now that I've changed some stuff around, but I think the question posed is still relevant and worth further discussion.


- Informers for the `Group` API are not started if the `Authentication` resource `.Spec.Type` is set to `OIDC`
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.Spec.Type` is set to `OIDC`
- `RoleBinding`s will be rejected if there exists a `RoleBindingRestriction` that specifies user and/or group restrictions in the namespace the `RoleBinding` is being created
Copy link
Member

@liouk liouk Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this further -- the RBRs have been built with OAuth in mind, but AFAIU RoleBindings should still generally work with OIDC users/groups.

// +listType=map
// +listMapKey=type
// +openshift:enable:FeatureGate=ExternalOIDC
OIDCConditions []metav1.Condition `json:"oidcConditions"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an idea of how we could structure this to make it usable for more generic tracking of the OIDC config/rollout progress:

status:
  oidcConfig:
    conditions:
    - message: ""
      reason: ""
      status: "True" # use this to indicate full KAS rollout on OIDC and start cleanup
      type: "Available"
    - message: ""
      reason: ""
      status: "False" # use this to indicate rollout in progress (new & updated config)
      type: "Progressing"
    - message: ""
      reason: ""
      status: "True" # use this to indicate CAO/KAS-o configuration issues
      type: "Degraded"
    # just an example structure about RBRs -- a separate condition type would allow to split meaning from the standard Available/Progressing/Degraded
    - message: "existing RoleBindingRestrictions on users/group not supported in external OIDC: ns1/rbr1, ns2/rbr2"
      reason: "UnsupportedRoleBindingRestrictions"
      status: "True"
      type: "UnsupportedResourceDetected"

It would be useful to separate this into status.oidcConfig.conditions because we might want to track other fields under status.oidcConfig (similar to status.oidcClients).

Finally, if we decide to block enablement if there are RBRs on Users/Groups, instead of a dedicated condition type we would use Degraded.

Copy link
Contributor

openshift-ci bot commented Dec 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joepvd for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Informer for the `Group` API are only configured and started as part of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled.
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.spec.type` is set to `OIDC`. This will prevent logs in the kube-apiserver associated with not being able to connect to the oauth-apiserver, which we know is not running when OIDC is enabled.

**Open Question**: Does the disabling of an admission plugin through `--disable-admission-plugins` mean that the plugin will not be initialized?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing some research on this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, disabling the admission plugin does not mean that the plugin will not be initialized.

This means that for the authorization.openshift.io/RestrictSubjectBindings admission plugin, we will likely need to defer the setup of the informers beyond initialization of the plugin.

@everettraven everettraven force-pushed the update/external-oidc-apiserver-impact branch from ba8e816 to 8246eb8 Compare December 18, 2024 21:01
@everettraven everettraven changed the title wip: add considerations for kube-apiserver admission plugins when ext… (oidc): add considerations for impacted kube-apiserver admission plugins Dec 18, 2024
@everettraven everettraven marked this pull request as ready for review December 18, 2024 21:02
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2024

##### Changes to the kube-apiserver

The OpenShift-specific patch to the kube-apiserver that adds this admission plugin is found here: https://github.com/openshift/kubernetes/blob/master/openshift-kube-apiserver/openshiftkubeapiserver/patch.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to a specific ref instead of master; this might result in a broken link if for some reason the file ever gets moved.


- Disable the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins
- Remove the `rolebindingrestrictions.authorization.openshift.io` CustomResourceDefinition
- The `Authentication` api to communicate when OIDC can't be enabled due to existing `RoleBindingRestriction` resources through a new conditions field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when OIDC can't be enabled due to existing RoleBindingRestriction resources

This bit should stand out more; also

through a new conditions field

as per another comment below, the new API field should be introduced in a separate section as it is more generic

Maybe something along the lines of:

OIDC won't be enabled while RoleBindingRestriction resources exist; this will be communicated in the new Authentication API OIDC status field.


In order to prevent misleading logs about informers that failed to start or failure to connect to the oauth-apiserver, the following changes to this patch are to be made:

- Informer for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
- Informer for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled.
- Informers for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled.

This will be done through updates to the appropriate config observers to update the `KubeAPIServerConfig.apiServerArguments` map to:

- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument
- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity: AFAIU normally it would be enough to remove the plugins from the --enable-admission-plugins arg, as they are not default plugins that need explicit disabling. However, the config observer doesn't have access to the final config object and therefore the --enable-admission-plugins field, therefore we'll use the --disable-admission-plugins to indicate what needs disabling. We'll also need a special merge so that it gets removed from enabled and added to disabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC from my experimenting, overriding the --enable-admission-plugins in the config to no longer include these admission plugins did not sufficiently disable them and is why I specifically call out adding them to the --disable-admission-plugins flag.

I'm not sure we need to into the exact semantics of how this achieved, but if we do I'm happy to do a bit more digging and figuring out what changes may need to be made to the config logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, no need to go into more detail here; I just added this note as a result of some digging I did, as a note to ourselves.

- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument
- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument

##### Changes to the `Authentication` API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we move this to a section independent of the RoleBindingRestrictions issue at hand. This API will be used to communicate status/progress of the OIDC configuration, which includes any issue with RBRs.


This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD.

Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been rolled out.

If we remove the CRD the moment the auth type becomes OIDC, we won't give time to the admins to react in case any RBRs exist, as the CRD will be removed immediately (and therefore any existing resources). I believe we'll want this in two steps: CAO complains if RBRs exist, and doesn't proceed with OIDC rollout. Once they are deleted, OIDC rollout proceeds. Once it is completed and OIDC is available (we'll use the new API field for that), OAuth cleanup starts, which includes deleting the CRD.

For the moment, this is the condition used to determine when OIDC has been enabled: https://github.com/openshift/cluster-authentication-operator/pull/740/files#diff-51c6cd196c758006bbe84eed012e6baac4713a856a96b7dfd10adc8ad7986e48R20

When we'll have the new API though, we'll use that to determine that it's available (i.e. Available=True). The KAS-o config observer will make sure to update the status accordingly when it detects that the KAS pods have been rolled out with OIDC.


As the cluster-authentication-operator will now be responsible for the `rolebindingrestrictions.authorization.openshift.io` CRD, it should no longer be added to the openshift/api payload manifests that are included in a payload image and get managed by CVO.

This will likely mean removing the associated files from the hack/update-payload-crd.sh script here: https://github.com/openshift/api/blob/dd0f68969241c0548906ec98c12bb208512cbbb4/hack/update-payload-crds.sh#L6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some information about how the rollout of this change will become effective in an upgrade (handover will be smooth, but we should write this down).


The OIDC authentication mode on the cluster will not be allowed to be enabled if any `RoleBindingRestriction` resources exist.

To communicate the reason for the enablement of the OIDC functionality being blocked, the `Authentication` API will be extended with a new status field to communicate the condition of the OIDC feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss further how we'll communicate this; for example, we can set Available=False/Degraded=True when RBRs exist. We'll need to also take care of some corner cases, e.g. what if someone creates RBRs after the CAO has started the rollout, but before the KAS pods have restarted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to discussing further how we communicate this. I'll go into a bit more detail on this and then we can refine it from there.

For the corner case where a RBR is created after the CAO has already started the rollout process but before the KAS pods have restarted, my expectation is that we remove the CRD, which in turn deletes the CRs (in this case the newly created RBRs). We can discuss this a bit further if we think that this is an unacceptable user experience, but I think this would be OK for now. We could add warnings in the OpenShift documentation for enabling OIDC that any RBRs created during the rollout of the OIDC functionality will be automatically removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my expectation is that we remove the CRD, which in turn deletes the CRs

I also think this sounds good enough for now 👍

OIDCConfig *OIDCConfig `json:"oidcConfig,omitempty"`
}

type OIDCConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it explicit that this type is used to communicate status info.

Suggested change
type OIDCConfig struct {
type OIDCConfigStatus struct {

Copy link
Contributor

openshift-ci bot commented Jan 7, 2025

@everettraven: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants