-
Notifications
You must be signed in to change notification settings - Fork 41
April 24, 2023
Our Django application (still named DDP_backend
...) now offers
- username + password authentication
- sign up + login, create an organization
- three user roles and related endpoint guards
- setting up an
Airbyte
workspace - sources, destinations, connections - setting up a
dbt
environment - see below - creating Prefect commands to sync an
Airbyte
connection and run adbt
workflow, these are tracked against the organization - tracking the destination warehouse against the organization and storing its connection credentials in AWS Secrets Manager
Only a few things are left to support a complete onboarding and configuration flow
- set up a scheduled data-flow (what
Prefect
calls a "deployment") - retrieve the
Airbyte
anddbt
logs fromPrefect
An organization's dbt
workspace is a folder on disk located at
${CLIENTDBT_ROOT}/<org>/
Inside this folder we create a Python venv
and install dbt
and its database-specific libraries like dbt-postgres
or dbt-bigquery
. The organization's dbt
repo (on GitHub) is cloned into
${CLIENTDBT_ROOT}/<org>/dbtrepo/
For private repositories we can accept a Personal Access Token from the user which we store in AWS Secrets Manager. These PATs have a lifetime and we need to figure out a good workflow to prompt the user to provide us a refreshed token when the old one expires.
We hit a hurdle communicating with Prefect
via its HTTP API since we could not set up flows and deployments that way. Prefect
's Python API is very simple but uses Python's async capabilities which are confusing to use from within Django
. To get around this we created a separate FastAPI
application to hold that functionality, which the Django
application talks to via HTTP. Conceptually these two should be thought of as one application.
We also set up a testing framework using PyTest
and intend to configure a code coverage tool to tell us how much of our code has associated tests written. We would also like to implement unit testing when we push to GitHub, but need to figure out how to do that since we don't want to let GitHub connect to our Airbyte or Prefect instances. Something will need to be mocked.
The frontend is being built using Next.js which is a framework for React. So far we have a login page, a PR for a signup page, and some skeletons for Airbyte
setup
In parallel, we are going to
- complete the frontend application
- set up on NGO's pipeline on the platform
"On the platform" means
- On a production machine
- The machine is monitored for uptime and for CPU, Memory, and Disk usage
- The machine runs
Airbyte
,Prefect
,DDP_backend
andprefect-proxy
- Configuration data for these applications is stored in
RDS
rather than in a local db -
Airbyte
's Jobs database will be local sinceAirbyte
keeps a large number of connections open -
Airbyte
's secrets are stored in our AWS Secrets Manager -
Prefect
's logs will go toRDS
... we may also send them toAWS Cloudwatch
Finally, a Superset application will be set up to read from the destination warehouse. We will attempt to run this in a docker
container on the same host and monitor the combined load