Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of expired auth cookie in kubeflow notebook #2904

Open
ZxMYS opened this issue Oct 18, 2021 · 14 comments
Open

Better handling of expired auth cookie in kubeflow notebook #2904

ZxMYS opened this issue Oct 18, 2021 · 14 comments

Comments

@ZxMYS
Copy link

ZxMYS commented Oct 18, 2021

Hi!

We are using kubeflow 1.3 and are running notebooks with it. It seems like after the authservice_session cookie expires, all requests to a running kubeflow notebook will be redirected (HTTP 302) to the kubeflow login page; This behavior is fine and natural when a user tries to open a page, but for a user who has already opened a jupyter notebook page and is using it, it is less so: the user will see this error dialog with a confusing error message when they do most actions (save a notebook, create a notebook, open a terminal, etc) on the notebook page:

Screen Shot 2021-10-18 at 3 59 10 PM

This error message is due to that Jupyter frontend code expects a JSON response to those requests. Since kubeflow redirects the requests to the login page, which is HTML, the notebook frontend can not parse the response properly.

Given that the authservice_session cookie seems to be valid for a day only, it's not uncommon for a notebook user who works on notebooks continuously to hit this issue.

I wonder if kubeflow can provide a better user experience here - e.g. instead of blindly redirecting all requests to the login page, only redirect the index page of a notebook server and return 403 for other requests. The Jupyter frontend can properly handle the 403 and display a proper error message, which is much less confusing.

To reproduce the error message in the screenshot:

  • Login to kubeflow
  • Start a notebook server, with an image based on jupyter notebook (the kf default ones are ok)
  • Connect to the notebook server (open its web ui) and keep the page open
  • Remove kubeflow's 'authservice_session' cookie with the browser's developer tools to simulate its expiration
  • Now when you do most actions in the notebook you see the confusing error message
@kwlzn
Copy link

kwlzn commented Oct 19, 2021

+1. I directly encountered this problem as well and found it to be incredibly confusing and something very specific to Kubeflow notebooks. It appeared like my notebook was broken, but after a full refresh it redirected me to the dex page where it became apparent this was an auth cookie problem.

@jbottum
Copy link

jbottum commented Oct 28, 2021

@kubeflow/wg-notebooks-leads /area notebooks /priority p2

@kimwnasptd
Copy link
Member

Thank you for raising this issue @ZxMYS!

I think a good first step here would be to extend the docs in kubeflow.org to mention this error. It could be a page where we document errors like these, that could be very confusing when encountered in the wild.

cc @shannonbradshaw

@thesuperzapper
Copy link
Member

@kimwnasptd probably the best bet is to set the default session timeout to 12+ hours, to reduce the likelihood of people encountering it.

I literally cannot think of a way to fix this, because any HTTP call with an expired session will be redirected to the auth provider (usually dex) by Istio, and because Jupyter is making requests in the background, those requests WILL be redirected, leading to this error.

@kwlzn
Copy link

kwlzn commented Dec 9, 2021

increasing the default session timeout to 12+ hours seems like a reasonable initial mitigation to reduce the frequency at which this occurs.

I wonder if a JupyterLab plugin specific to Kubeflow could make sense as a pattern to solve this (at least for JupyterLab-based runtimes)? this minimal plugin could run in the front-end application layer (in the browser) and periodically poll the backend server to detect expired auth via a sudden switch from HTTP 200 -> HTTP 302 status codes - then if expired, inform the user via a modal prompt w/ link and/or redirect to the auth provider to re-auth.

@thesuperzapper
Copy link
Member

I wonder if a JupyterLab plugin specific to Kubeflow could make sense as a pattern to solve this (at least for JupyterLab-based runtimes)? this minimal plugin could run in the front-end application layer (in the browser) and periodically poll the backend server to detect expired auth via a sudden switch from HTTP 200 -> HTTP 302 status codes - then if expired, inform the user via a modal prompt w/ link and/or redirect to the auth provider to re-auth.

@kwlzn that is an interesting idea, would you have experience to create such a plugin?

@kwlzn
Copy link

kwlzn commented Jan 20, 2022

@thesuperzapper yeah, my group at Twitter has built jupyterlab plugins that can refresh the UI like this for things like dynamically changing the ContextManager at runtime etc. so, I think it should be possible - and the UI layer seems like the right place to do this checking.

I'll see if I can motivate someone to pick this up as a contrib.

@stale
Copy link

stale bot commented Apr 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@vinayan3
Copy link

I wonder if a JupyterLab plugin specific to Kubeflow could make sense as a pattern to solve this (at least for JupyterLab-based runtimes)? this minimal plugin could run in the front-end application layer (in the browser) and periodically poll the backend server to detect expired auth via a sudden switch from HTTP 200 -> HTTP 302 status codes - then if expired, inform the user via a modal prompt w/ link and/or redirect to the auth provider to re-auth.

@kwlzn that is an interesting idea, would you have experience to create such a plugin?

I'm building a plugin for Jupyterlab to do exactly this. The errors users in my org is slightly different. The kernel becomes disconnected, you cannot save, and in the developer console there is CORs errors related to the 302 status. The identity provider that is used does not allow AJAX requests to get authenticated and it requires the user open a page on their own to auth. The solution of having a button which opens up a pop up and then having the user auth is the solution. JupyterLab these days is quite good at reconnecting once the 3xx response codes + CORs errors stop.

If others are interested I can make this public.

@kwlzn
Copy link

kwlzn commented Aug 24, 2022

If others are interested I can make this public.

@vinayan3 that would be awesome!

@ZxMYS
Copy link
Author

ZxMYS commented Aug 26, 2022

@vinayan3 +1 definitely interested!

@simonjcarr
Copy link

simonjcarr commented May 23, 2023

I don't know what underlying restrictions there might be, but would it not be possible for the auth token to be regenerated through the use of a refresh token (This is the normal way of dealing with this problem). This would also make the system more secure, as the auth token could then have a much shorter life span, perhaps 5 minutes. When the token has 1 minutes life left, a new token would be requested using the refresh token. This way the user would never be logged out unless they specifically choose to log out and if an auth token was accidently exposed, it would only be valid for a maximum of 5 minutes.

@stale stale bot removed the lifecycle/stale label May 23, 2023
@juliusvonkohout
Copy link
Member

/transfer manifests

@google-oss-prow google-oss-prow bot transferred this issue from kubeflow/kubeflow Nov 1, 2024
@juliusvonkohout
Copy link
Member

With the refresh cookie in oauth2-proxy/dex this is already mitigated a lot. I tested it on 1.9.1 with oauth2-proxy only, but someone needs to provide the dex refresh settings as well here.

Is anyone willing to create a PR?

/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants