Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain binderhub analytics events #4423

Closed
Tracked by #4109
yuvipanda opened this issue Jul 12, 2024 · 6 comments
Closed
Tracked by #4109

Retain binderhub analytics events #4423

yuvipanda opened this issue Jul 12, 2024 · 6 comments
Assignees

Comments

@yuvipanda
Copy link
Member

yuvipanda commented Jul 12, 2024

As part of #4365, binderhub events now flow into a log in a separate google cloud project made just for this (https://console.cloud.google.com/logs/query;query=logName%3D%22projects%2Fbinderhub-event-logs%2Flogs%2Fbinderhub-event-logs%22;cursorTimestamp=2024-07-12T19:57:06.488977Z;duration=PT1H?referrer=search&project=binderhub-event-logs). However, this is retained only for 30 days.

What we want to do with this data is to be determined eventually in collaboration with partnerships. In the meantime, we should take efforts to not lose it after 30 days.

Google Cloud logging supports routing logs to a specific destination where they can survive long term. Since we want the most cloud agnostic option, that would be routing it to a cloud storage bucket

This task is to create configuration so that only the logs of these launches (identified by logName="projects/binderhub-event-logs/logs/binderhub-event-logs") get archived into a new storage bucket (let's name it binder-2i2c-events). This can be done in the cloud console, no terraform needed at this moment.

@consideRatio
Copy link
Member

I think i may have already done this, but not confirmed it works because i was low on time and it said it was getting done every hour.

I also changed the default log storage bucket to retain things for two years in this project in case it fails to transfer to the dedicated bucket for binderhub event logs.

@consideRatio
Copy link
Member

Screenshot_20240712-224631

Log router + log storage was things i tried using for this

@GeorgianaElena GeorgianaElena self-assigned this Jul 16, 2024
@GeorgianaElena
Copy link
Member

@consideRatio, I believe your setup was working indeed 🚀 It was however storing the logs into a logs into a Cloud Logging Bucket, and @yuvipanda's request was for a cloud storage bucket.

I just createad a new storage bucket with the name binder-2i2c-events, with the following configuration:
https://console.cloud.google.com/storage/browser/binder-2i2c-events;tab=configuration?project=binderhub-event-logs&supportedpurview=project
Screenshot 2024-07-16 at 13 34 39

And then re-configured the log router that @consideRatio created to route the logs into the cloud storage bucket rather than the log bucket.

Screenshot 2024-07-16 at 13 36 51

@GeorgianaElena
Copy link
Member

@yuvipanda, before considering this done, can you please confirm that:

  • the bucket's config looks good to you as I had to make some decissions like making it multi-region etc
  • is ok to delete the logs bucket that Erik created (it currently has a few days worth of data)?

@yuvipanda
Copy link
Member Author

@GeorgianaElena looks good to me! You can delete the old log bucket (which I assume no longer receives events), and close this ticket once you confirm new events have come to the cloud bucket!

@GeorgianaElena
Copy link
Member

Data is in the new bucket, so deleting the old one. Thank you @yuvipanda!

Screenshot 2024-07-17 at 11 38 14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants