Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow PRs for Group Support: Enabling RBAC Based on User Groups #2910

Open
6 tasks
axel7083 opened this issue Apr 7, 2023 · 14 comments
Open
6 tasks

Kubeflow PRs for Group Support: Enabling RBAC Based on User Groups #2910

axel7083 opened this issue Apr 7, 2023 · 14 comments

Comments

@axel7083
Copy link
Contributor

axel7083 commented Apr 7, 2023

Descriptions

I am opening a dedicated issue to follow on the PRs opened related to the group support.

The goal is to add support for groups in kubeflow. Currently only supporting RBAC based on user-id. The oidc-authservice will send the "kubeflow-userid" and "kubeflow-groups", but currently the "kubeflow-groups" header is ignored.

Why

Supporting Groups would facilitate the integration of Kubeflow in existing systems. Without the need to create/maintains or delete role bindinds for each user.

Current Status

kubeflow/kubeflow

The centraldashboard would also requires some modification to be able to comply with the modified kfam.

kubeflow/pipeline

kubeflow/manifest

kubeflow/katib

kserve/models-web-app

This does not requires any modification in its Dockerfile it clone the kubeflow/kubeflow repository and use components/crud-web-apps/common/backend

Related issues

kubeflow/dashboard#42
kubeflow/kubeflow#5071
kubeflow/kubeflow#4998

/kind feature
/area multiuser

@kimwnasptd
Copy link
Member

@axel7083 it's really great to see how you are approaching this e2e and touching all KF components. Really great work!

I think we'll also need to start including in the mix how we'll end up having headers and support in VirtualServices as well. The main problem being how to parse list values (since a user could belong in multiple groups).

In any case would you be able to attend the Notebooks WG meeting on June 8 and give an overview of your current work? This will really help ensure we are all on the same page and coordinate the last pieces and also have a proposal that summarizes this architecture.

@thesuperzapper
Copy link
Member

@kimwnasptd In a related discussion, it's always been strange that we pass the user-id header around as plain text, rather than using a JWT and extracting the user-id from it.

Our current user-id header means that we have to be extremely careful about restricting which traffic can reach the pods, because "impersonating" a user is as simple as passing their user-id in the header.

If we swapped to JWT Authorization headers, which can be verified as the traffic hits the application (either by the app itself, or the istio sidecar), there would be less concern about restricting traffic because we could verify the authenticity of who the traffic is claiming to be (by checking the JWT was issued by a trusted issuer).

Also, JWT has a claim for "groups", which we can use instead of adding a group-ids header, with similar trust benefits.

@kimwnasptd
Copy link
Member

@thesuperzapper exactly! This is where I'm also getting at

@thesuperzapper
Copy link
Member

@kimwnasptd Also, using JWT only removes the need to add a specific "groups" header, and does not negate the need to implement group-checking in each of the apps (because right now, everything is only related to the user-id header right now).


Regarding the specific implementation (for group-checking) proposed by @axel7083, I am hesitant to use the Groups entity in Kubernetes RBAC, because believe it's not really related to the higher-level concept of a "group kubeflow users".

That is to say, the KFAM API currently creates role bindings like user-user1-example-com-clusterrole-edit in the profile namespaces. These role bindings are only really used by Kubeflow Pipelines right now, and are used to perform SubjectAccessReviews with the userid-header it receives. This can be extended to include groups, but this might have unexpected security implications for the cluster itself.

This is because our role bindings ALSO affect the cluster itself. If the user-id happens to correspond to the User which the cluster-level auth (kubectl) provides, then anyone with that username will also have that level of access in the namespace (this behavior is desired in many cases, but not all). However, if we also create role bindings for Groups, we are much more likely to hit a group that exists at the cluster-level and give unintended access.

Alternatively, we can keep using User bindings only, but a prefix on our bindings to distinguish the entity kind, and make them never impact kubectl access. For example, we could create role bindings like:

## for "users" (like we currently do, but with a prefix)
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: User                                                                                                                          
    name: "user.kubeflow.org:{USER_ID}"

## for "user groups"
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: User                                                                                                                          
    name: "groups.kubeflow.org:{GROUP_ID}"

## for "kubeflow service accounts" (just spitballing here)
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: User                                                                                                                          
    name: "sa.kubeflow.org:{GROUP_ID}"

Obviously, this would break for people who are currently relying on these role bindings to give kubectl access to their users, but we can probably extend the Profile CRD to make users explicitly specify if (and how) they want to create the kubectl bindings.

@axel7083
Copy link
Contributor Author

@kimwnasptd In a related discussion, it's always been strange that we pass the user-id header around as plain text, rather than using a JWT and extracting the user-id from it.

Our current user-id header means that we have to be extremely careful about restricting which traffic can reach the pods, because "impersonating" a user is as simple as passing their user-id in the header.

If we swapped to JWT Authorization headers, which can be verified as the traffic hits the application (either by the app itself, or the istio sidecar), there would be less concern about restricting traffic because we could verify the authenticity of who the traffic is claiming to be (by checking the JWT was issued by a trusted issuer).

Also, JWT has a claim for "groups", which we can use instead of adding a group-ids header, with similar trust benefits.

The oidc-authservice have a build-in support for JWT token, but the default behavior does not use them.

While investigating the possible solution for improving the authorization system in kubeflow to integrate the Group support, I used the oauth2-proxy as an alternative, I opened a PR in the manifests repository since it does not requires any modification of the components code, it can replace the oidc-authservice: Adding oauth2-proxy as optional alternative to oidc-authservice. (I know that @kimwnasptd already take a look at it, since it has been added to the Manifests WG Roadmap for KF 1.8.)

A notable advantage for using JWT tokens, is the support by istio in AuthorizationPolicy. Here is an example provided by their documentation:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: foo
spec:
  selector:
    matchLabels:
      app: httpbin
  action: ALLOW
  rules:
  - from:
    - source:
       requestPrincipals: ["[email protected]/[email protected]"]
    when:
    - key: request.auth.claims[groups]
      values: ["group1"]

In the current implementation I made with all the components, I add to made a tricky implementation for the AuthorizationPolicy for the Notebook.

Example if the group required is kubeflow:group1, in the accepted value, for the header kubeflow-groups

- key: request.headers[kubeflow-groups]
          values:
            - kubeflow:group1
            - '*,kubeflow:group1'
            - '*,kubeflow:group1,*'
            - kubeflow:group1,*

This is far from being nice, but giving the kubeflow-groups as a list of string separated with comma, like the oidc-authservice is providing, force us to make a work arround.

@axel7083
Copy link
Contributor Author

@kimwnasptd Also, using JWT only removes the need to add a specific "groups" header, and does not negate the need to implement group-checking in each of the apps (because right now, everything is only related to the user-id header right now).

Regarding the specific implementation (for group-checking) proposed by @axel7083, I am hesitant to use the Groups entity in Kubernetes RBAC, because believe it's not really related to the higher-level concept of a "group kubeflow users".

That is to say, the KFAM API currently creates role bindings like user-user1-example-com-clusterrole-edit in the profile namespaces. These role bindings are only really used by Kubeflow Pipelines right now, and are used to perform SubjectAccessReviews with the userid-header it receives. This can be extended to include groups, but this might have unexpected security implications for the cluster itself.

This is because our role bindings ALSO affect the cluster itself. If the user-id happens to correspond to the User which the cluster-level auth (kubectl) provides, then anyone with that username will also have that level of access in the namespace (this behavior is desired in many cases, but not all). However, if we also create role bindings for Groups, we are much more likely to hit a group that exists at the cluster-level and give unintended access.

Alternatively, we can keep using User bindings only, but a prefix on our bindings to distinguish the entity kind, and make them never impact kubectl access. For example, we could create role bindings like:

## for "users" (like we currently do, but with a prefix)
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: User                                                                                                                          
    name: "user.kubeflow.org:{USER_ID}"

## for "user groups"
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: User                                                                                                                          
    name: "groups.kubeflow.org:{GROUP_ID}"

## for "kubeflow service accounts" (just spitballing here)
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: User                                                                                                                          
    name: "sa.kubeflow.org:{GROUP_ID}"

Obviously, this would break for people who are currently relying on these role bindings to give kubectl access to their users, but we can probably extend the Profile CRD to make users explicitly specify if (and how) they want to create the kubectl bindings.

Thanks for your comment and details on the possible implication on the cluster security itself ! I have never been working with such high level concept in Kubernetes so I might have a lack of knowledge on those matter, really appreciate !

First for the idea of using a given prefix to define a group while using the User object, I do not understand, how for example given a JWT token with a user-id and groups, would the subject review be done to ensure authorization ? Do you want to loop over all the user's groups and make a Subject review for each of them, by considering them as User ?

The advantage of using the Groups in Kubernetes is allowing an easier subject review since it can be made in one request only. If the concern is about the naming, and the risk of touching some internal Kubernetes existing group name, the idea you offer about the prefix, might be the solution ?

If the idea for the user would be to propagate something like "user.kubeflow.org:{USER_ID}" that would imply some transformation made when the user would login, that idea could be transpose to groups, all the groups of users, fetched from a LDAP, keycloak or anything, could be converted to the safe "user.kubeflow.org:{GROUP_ID}".

@thesuperzapper
Copy link
Member

@axel7083 you are right that we would benefit from using the Groups for bindings, as SubjectAccessReviewSpec.groups is a list (preventing the need to loop over multiple groups if they were stored as Users).

I guess the same idea of adding a prefix (to prevent collisions with real groups) works in group names too, for example:

## for "user groups"
subjects:                                                                                                                             
  - apiGroup: rbac.authorization.k8s.io                                                                                                 
    kind: Group                                                                                                                          
    name: "groups.kubeflow.org:{GROUP_ID}"

Also, I agree that replacing oidc-authservice with oauth2-proxy is a good idea, and I have actually done this in my new (currently unreleased, but close to ready) distribution of Kubeflow (and more) called deployKF.

But using oauth2-proxy does not change the fact that the apps themselves (Kubeflow Pipelines, Notebooks, Katib, etc) are all expecting a kubeflow-userid header with a plain text string of the requester's email. We should change them all to look for an Authorization header containing a JWT, from which they can extract the email and groups claims (and also verify the JWT is valid against the issuer's signature).

However, you are correct that the istio AuthorizationPolicies that grant users access to Notebooks running in profile namespaces could be easily changed to check the user and group claims from a JWT header.

@axel7083
Copy link
Contributor Author

axel7083 commented May 30, 2023

@thesuperzapper I am a bit septic about making each kubeflow resources responsible of parsing the JWT tokens it add a lot of complexity. But the question is very interesting.

Possible issues with JWT tokens

  • How to ensure they all allow the same issuers ? Add a bunch of ENV in all components ?
  • What if we need to use custom claims for example use "user-groups" instead of "groups", should all the kubeflow components implements their own logics to handle that ? Should we create a shared components like kubeflow/components/crud-web-apps/common ? And add again more ENV ?
  • Does it affect the performances ? The envoy filter, will parse the JWT once to ensure the validity, and all the kubeflow resources a second time for each request. This make it very redundant.
  • What if we need to update it again, we will have to change all components again, if we add more and more components in the future, this will create a lot of sludge.

The idea of having the user-id and user-groups in plain text bother me too.

Improving the authorization and resources access system

Maybe we should take another approach, some ideas:

I Remove the Subject review logic from kubeflow components

Instead of having each components making the subject review for resources, we could create a component responsible of that. It can have a simple endpoint /authorize taking ResourceAttributes as body and the JWT token as authorization header.

Each kubeflow resources, will now simply have to call the authorize endpoint, by forwarding the authorization header, and add the ResourceAttributes they need to ensure the authorization for.

This can be a pretty nice solution:

  • This remove each components the responsibility of calling the Kubernetes api
  • It allows to have a centralize place for authorization, this make it easy to implement caching for parsing JWT tokens, and could avoid useless call to Kubernetes api.
  • It makes it really easier to make changes in the future. The authorization responsibility out of kubeflow components, adding new one will require less work on understanding how the internal logic works.

Maybe do not propagate the JWT token

Instead of propagating the JWT token, and asking kubeflow components to forward it back to the authorization component, we could propagate some kind of non-sense, uuid, or random text, or token, that the kubeflow will have to forward to the authorization endpoint.

This would add some complexity since it would requires to have a components ahead of the oauth-proxy to add the token to the upstream request.

A benefit would be, that if a request to to one kubeflow resource is made by the user, with a long life token, it would not be propagate in the cluster, reducing the risk of leaking. And making them with a short lifespan.

II Components sidecar

Still in the same idea of removing the JWT parsing from the internal components logics, having a sidecar containers which can either be parsing the JWT and sending the user-id and user-groups to the components, and being able to deny request if JWT token is not valid, making the isito-filter useless. And removing the redundant parsing. Having the user-id and user-groups in plain text between containers on the same pod, is maybe? less an issue than having them between components.

From a security point of vu, an attacker could call Kubeflow resources from one internal components and put whatever header he wants to impersonate anyone.

@thesuperzapper
Copy link
Member

While I agree we should discuss how best to dynamically get groups from the user's JWT, I also wanted to share that deployKF has a "virtual group" concept, where you can define groups of users and assign them to specific profiles and access levels. They are "virtual" in the sense that while you define groups, they are rendered down to specific user/email access bindings.

See the deploykf_core.deploykf_profiles_generator.* config values.

@AndersBennedsgaard
Copy link
Contributor

Just pitching in here with my two cents.

I am also not a huge fan of the user and group headers as authentication, since impersonation is incredibly easy, intercepting valid users is similarly easy, it doesn't work with other non-Kubeflow services, etc. Instead, I would love to use simple JWT tokens using regular OIDC.
While this does increase complexity in Kubeflow services (since the current situation can't get much simpler), the benefits far outweigh the drawbacks.

Regarding the possible issues with using JWTs as @axel7083 mention:

  • How to ensure they all allow the same issuers ? Add a bunch of ENV in all components ?
  • What if we need to use custom claims for example use "user-groups" instead of "groups", should all the kubeflow components implements their own logics to handle that ? Should we create a shared components like kubeflow/components/crud-web-apps/common ? And add again more ENV ?

[...]

  • What if we need to update it again, we will have to change all components again, if we add more and more components in the future, this will create a lot of sludge.

It would make sense to create a library which handles shared functionality and configuration, reducing the need for extensive code rewriting with updates and creating new components which integrates with Kubeflow is much easier. Since most (all?) Kubeflow components lives in the same namespace, you could just create a single ConfigMap/Secret which contains auth configuration and is shared among services.

Does it affect the performances ? The envoy filter, will parse the JWT once to ensure the validity, and all the kubeflow resources a second time for each request. This make it very redundant.

I think validation of the JWT should be configurable in each component.
Users that are satisfied with Istio's validation of JWT tokens can just turn if off for all components and enjoy slightly faster components, and high-security platforms can turn it on for all components for increased security. This might also make the Istio validation obsolete (which would make it easier to switch out Istio for another service mesh in the future).

If we don't want each component being responsible for validating JWTs, I think the first suggestion of a separate authorization service would make most sense. Perhaps it could be built into KFAM, since it seems to be somewhat related to that component, and we wouldn't have to create a new component.

I am not a huge fan about the suggestion to pass some randomly generated string between Kubeflow components for auth. This would have many of the same issues as the current situation with user/group headers, such as making it extremely annoying to integrate non-standard Kubeflow components into the mix.

The suggestion about a authorization sidecar seems like a bad choice as a long-term solution. The Kubernetes community are generally speaking accepting sidecars as a bad practice, since they consume a lot of unnecessary resources (many are surprised about how many resources Istio sidecars reserve, and therefore, cost), they don't really increase security, and day 2 operations gets very annoying.
However, using one as a short-term solution for authentication and translating JWTs to user/group headers might make sense.


Ps. would it make sense to create a new issue about using JWTs for authentication in favor of user/group headers, and close this issue as "not planned", since the discussion has diverged quite a bit from the title?

@axel7083
Copy link
Contributor Author

axel7083 commented May 9, 2024

@AndersBennedsgaard thanks you very much for the details and explanation

@kimwnasptd
Copy link
Member

For anyone following this issue, I've prepared a first proposal for how to work with JWTs on Kubeflow in #2748

After a proposal on this topic is approved we can keep pushing the discussion on groups. I'm preparing a WIP proposal for groups, which will be just a first step, but let's focus initially on handling JWTs and then we can handle groups.

@thesuperzapper
Copy link
Member

thesuperzapper commented Oct 22, 2024

Since 1.9.1, the default manifests use oauth2-proxy and send a kubeflow-groups header with every request (similar to how we send kubeflow-userid).

The kubeflow-groups header contains a comma-separated list of groups extracted from the Dex JWT, see this RequestAuthentication for where this extraction happens.

This means that we can now start the process of reading this header when:

  1. determining which profiles to show the user (in the central dashboard)
  2. doing SubjectAccessReviews (in Pipelines and Notebooks).

First, we must allow users to be assigned to a profile based on the content of their kubeflow-groups header, rather than just their kubeflow-userid header.

We need to update the KFAM API to additionally support a groups parameter on at least these endpoints:


We also need to introduce a format for a group RoleBinding that can be read by KFAM and used by SubjectAccessReviews.

I think the new group RoleBinding should look like this:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: group-<HASH_OF_GROUP_NAME>-clusterrole-<GROUP_ROLE>
  namespace: <PROFILE_NAME>
  annotations:
    role: <GROUP_ROLE> # "edit" or "view"
    group: <RAW_GROUP_NAME>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubeflow-<GROUP_ROLE>
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: <RAW_GROUP_NAME>

For context, here is what the user RoleBindings look like:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: user-<SAFE_USER_EMAIL>-clusterrole-<USER_ROLE>
  namespace: <PROFILE_NAME>
  annotations:
    ## NOTE: KFAM only reads these annotations when checking the user's level of access
    ## https://github.com/kubeflow/kubeflow/blob/v1.9.1/components/access-management/kfam/bindings.go#L195-L238
    role: <USER_ROLE>
    user: <RAW_USER_EMAIL>
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubeflow-<USER_ROLE>
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: <RAW_USER_EMAIL>

After that, we can update the central dashboard to send the group information when it calls KFAM, and interpret the responses correctly.

Here is the code for the central-dashboard's KFAM proxy:

Then we would need to update the: main page, namespace selector, and manage contributors pages:

We probably also want to allow each "link" on the sidebar of the dashboard to mark itself as supporting groups or not:

Here are some tips for people wanting to work on updating Central Dashboard:

  • Search for places which reference a USERID_HEADER or kubeflow-userid or X-Goog-Authenticated-User-Email We will need to also support a group version of everywhere this is used.

@andreyvelich
Copy link
Member

/transfer manifests

@google-oss-prow google-oss-prow bot transferred this issue from kubeflow/kubeflow Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants