Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get build data in BigQuery #596

Open
willkg opened this issue Jul 23, 2019 · 4 comments
Open

get build data in BigQuery #596

willkg opened this issue Jul 23, 2019 · 4 comments

Comments

@willkg
Copy link
Contributor

willkg commented Jul 23, 2019

Buildhub2 has a Django app that serves the API. The daemon stores all the data in Postgres tables managed by Django and then indexes that for Elasticsearch.

However, we can probably just do all of this in BigQuery. Further, if we had it in BigQuery, it is easier to access the data from Telemetry tools. Buildhub2 is used by several things in Telemetry (Mission Control, probe-scraper, other things?) and that number should continue to grow as they add more dashboards around crash pings and other things.

This issue covers getting the data in BigQuery either by ditching Postgres for BigQuery or ditching Elasticsearch for BigQuery.

@wlach
Copy link

wlach commented Jul 23, 2019

Could we start by adding the data to bigquery on ingestion? That would let us keep the existing infrastructure and gradually moving over to bigquery-only solutions over time.

@willkg
Copy link
Contributor Author

willkg commented Jul 25, 2019

I'm not sure that's easier than ditching postgres for BigQuery, but seems feasible to me.

@peterbe
Copy link
Contributor

peterbe commented Sep 11, 2019

Ingestion calls Build.insert(build_data) and at the bottom of that method, if it inserted into Postgres, it calls send_to_elasticsearch(cls, build). You could either add (or replace) a line immediately after that looks like send_to_bigquery(cls, build).

I'm pretty oblivious about how BigQuery works in terms of inserts and bulk inserts but I get a feeling you could just cp https://github.com/mozilla-services/buildhub2/blob/master/buildhub/main/management/commands/reindex-elasticsearch.py tp reindex-bigquery.py and edit accordingly.

That way you'd be able to try it out very gently.

By the way, that frontend that Buildhub2 has is pretty neat because it's able to provide a pretty decent interface without really knowing it's got anything to do with software builds. I never really loved searchkit (because I never really understood) it but the progress certainly was cheap.

@wlach
Copy link

wlach commented Jan 6, 2020

I think it probably makes more sense to do this in airflow as a seperate process, rather than in buildhub. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1607229 about that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants