Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Policy/Process for Hub Storage Quota #4447

Open
4 tasks
balajialg opened this issue Apr 7, 2023 · 1 comment
Open
4 tasks

Define Policy/Process for Hub Storage Quota #4447

balajialg opened this issue Apr 7, 2023 · 1 comment
Assignees

Comments

@balajialg
Copy link
Contributor

balajialg commented Apr 7, 2023

Summary

@felder, @ryanlovett, and I had a good discussion about issue #4414 where I observed that a few users in the biology hub stored large amounts of data forcing us to provision a higher file store tier for the hub. Our tentative estimate revealed that around 10 out of 800+ users stored around 3 TB worth of data in the biology hub (overall storage consumed in the biology file store tier was around 4.1 TB). Investigation into type of files stored in some of these users revealed that they had large .fastq file. .Fastq files are data files used in bio sequencing.

We are paying close to $1700 per month for the biology file store tier which amounts to around $20,400 per year.

We can definitely save $$$ at the order of few thousands by having an effective policy + process to handle exceptions. So, this is prompting us to decide on a policy and process to handle such exceptions currently and also avoid this scenario in the future.

Some of the options floated were,

  • Follow the process outlined in Plan for hubs that are not used this semester! #4313, inform the instructors using the biology hub about deleting the student files, provide students 2 weeks time frame to delete the files, and then delete after the time interval. Extend this scenario to other hubs where we see few users taking up a large amount of file stores.
  • Enforce storage quota for hubs. For eg: enforce a storage quota of 5GB for all students using the biology hub. Make modifications to kubespawner to enforce such storage quota when students login to generic datahubs.
  • Develop a system where instructors inform Datahub team about their plans for the semester and storage requirement so that Datahub team can track outliers.

User Stories

  • As an infra admin, I have a well defined policy + process to handle filestore storage exceptions

Important information

Tasks to complete

  • Define policy for handling storage exceptions
  • Define process for handling storage exceptions
  • Communicate the policy + process to the concerned instructors
@balajialg balajialg self-assigned this Apr 7, 2023
@felder
Copy link
Contributor

felder commented Aug 17, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants