Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to publish the results #25

Open
dlorenc opened this issue Feb 3, 2021 · 4 comments
Open

Figure out how to publish the results #25

dlorenc opened this issue Feb 3, 2021 · 4 comments

Comments

@dlorenc
Copy link
Contributor

dlorenc commented Feb 3, 2021

Right now things are in individual GCS objects, formatted as JSON. This is easy to look at and browse, but probably not the best for querying.

We could load these into bigquery, some other database, or publish sqlite dumps or something. Whatever is useful to people!

Chime in here if you have ideas for things to do with this data, and can think of formats we can publish in that would be useful for you.

@g-k
Copy link
Contributor

g-k commented Feb 12, 2021

Bigquery would be great since the PyPI dataset is in there. JSON available via CDN would be awesome too. Are there any rules or licensing around use of the data (e.g. how would you like to be attributed)?

And more generally, thanks for working on this! I was talking to Jordan and Ashish from GATech towards the end of last year and did some similar work on the Mozilla Dependency Observatory. So it's great to see you all really running with this, since it's long overdue.

@dlorenc
Copy link
Contributor Author

dlorenc commented Feb 12, 2021

Let me check on the data licensing! I think we're planning on one of these two: https://cdla.dev/

I'd probably lean toward the permissive one. Would that work for you?

@g-k
Copy link
Contributor

g-k commented Feb 16, 2021

Let me check on the data licensing! I think we're planning on one of these two: https://cdla.dev/

I'd probably lean toward the permissive one. Would that work for you?

I think so, I'll confirm with legal internally. Thank you!

@g-k
Copy link
Contributor

g-k commented Feb 19, 2021

To follow up: yes, either CDLA license will work for our initial internal use cases. I'm to check back with legal if we make the data public or start modifying it since that carries additional considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants