-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out how to publish the results #25
Comments
Bigquery would be great since the PyPI dataset is in there. JSON available via CDN would be awesome too. Are there any rules or licensing around use of the data (e.g. how would you like to be attributed)? And more generally, thanks for working on this! I was talking to Jordan and Ashish from GATech towards the end of last year and did some similar work on the Mozilla Dependency Observatory. So it's great to see you all really running with this, since it's long overdue. |
Let me check on the data licensing! I think we're planning on one of these two: https://cdla.dev/ I'd probably lean toward the permissive one. Would that work for you? |
I think so, I'll confirm with legal internally. Thank you! |
To follow up: yes, either CDLA license will work for our initial internal use cases. I'm to check back with legal if we make the data public or start modifying it since that carries additional considerations. |
Right now things are in individual GCS objects, formatted as JSON. This is easy to look at and browse, but probably not the best for querying.
We could load these into bigquery, some other database, or publish sqlite dumps or something. Whatever is useful to people!
Chime in here if you have ideas for things to do with this data, and can think of formats we can publish in that would be useful for you.
The text was updated successfully, but these errors were encountered: