Skip to content

April 24, 2023

Rohit K. Chatterjee edited this page Apr 24, 2023 · 1 revision

Progress Update

Our Django application (still named DDP_backend...) now offers

  • username + password authentication
  • sign up + login, create an organization
  • three user roles and related endpoint guards
  • setting up an Airbyte workspace - sources, destinations, connections
  • setting up a dbt environment - see below
  • creating Prefect commands to sync an Airbyte connection and run a dbt workflow, these are tracked against the organization
  • tracking the destination warehouse against the organization and storing its connection credentials in AWS Secrets Manager

Only a few things are left to support a complete onboarding and configuration flow

  • set up a scheduled data-flow (what Prefect calls a "deployment")
  • retrieve the Airbyte and dbt logs from Prefect

dbt setup

An organization's dbt workspace is a folder on disk located at ${CLIENTDBT_ROOT}/<org>/

Inside this folder we create a Python venv and install dbt and its database-specific libraries like dbt-postgres or dbt-bigquery. The organization's dbt repo (on GitHub) is cloned into ${CLIENTDBT_ROOT}/<org>/dbtrepo/

For private repositories we can accept a Personal Access Token from the user which we store in AWS Secrets Manager. These PATs have a lifetime and we need to figure out a good workflow to prompt the user to provide us a refreshed token when the old one expires.

prefect-proxy

We hit a hurdle communicating with Prefect via its HTTP API since we could not set up flows and deployments that way. Prefect's Python API is very simple but uses Python's async capabilities which are confusing to use from within Django. To get around this we created a separate FastAPI application to hold that functionality, which the Django application talks to via HTTP. Conceptually these two should be thought of as one application.

PyTest

We also set up a testing framework using PyTest and intend to configure a code coverage tool to tell us how much of our code has associated tests written. We would also like to implement unit testing when we push to GitHub, but need to figure out how to do that since we don't want to let GitHub connect to our Airbyte or Prefect instances. Something will need to be mocked.

Frontend application

The frontend is being built using Next.js which is a framework for React. So far we have a login page, a PR for a signup page, and some skeletons for Airbyte setup

Next up

In parallel, we are going to

  • complete the frontend application
  • set up on NGO's pipeline on the platform

"On the platform" means

  • On a production machine
  • The machine is monitored for uptime and for CPU, Memory, and Disk usage
  • The machine runs Airbyte, Prefect, DDP_backend and prefect-proxy
  • Configuration data for these applications is stored in RDS rather than in a local db
  • Airbyte's Jobs database will be local since Airbyte keeps a large number of connections open
  • Airbyte's secrets are stored in our AWS Secrets Manager
  • Prefect's logs will go to RDS... we may also send them to AWS Cloudwatch

Finally, a Superset application will be set up to read from the destination warehouse. We will attempt to run this in a docker container on the same host and monitor the combined load