qa4sm-airflow

This repository contains the configuration to set up the task scheduler for QA4SM . It is based on the airflow docker setup.

Make sure that the file qa4sm.env which contains the environment variables required by some of the DAGs to access different data sources etc. is available in the same directory as the docker-run.sh script
To start all necessary container, call the docker-run.sh script. Starting the service should pull the 'latest' available versions of DAG containers (airflow-dag-XXX with pull_policy: always in compose.yaml).
After a few seconds the airflow service should run on port 8085 (localhost:8085).

Setup

This setup will build/launch multiple images/containers. See the compose.yml setup.

Airflow Common

See the Dockerfile. This is just the original airflow image with the packages from base_requirements.txt installed and a new directory to mount the qa4sm data. Compose will build it if it's not yet available.

Dag containers

DAGs consist of different operators. Complex operators should start their own containers (DockerOperator) that contain all the code they need (to keep the task scheduler environment simple and operators separated). DockerOperators can use different python version, or even different programming languages. Airflow is just the scheduler to launch them! Of course, if the base environment is sufficient (for simple tasks like moving files) you don't have to use the DockerOperator (that's also why we have the base_requirements file - but don't add any heavy dependencies there!)

Environment variables

Most DAGs require some (secret) variables such as tokens to download data or communicate with the service API (prod, test, dev). These are stored in a file qa4sm.env the same directory as docker-run.sh. See the template qa4sm.env.template file for a list of required variables.

Communicating with local development instance

Usually developing DAGs also needs access to a local instance of qa4sm (never send requests to the production API!). For this

start the qa4sm development instance with IP 0.0.0.0 (python manage.py runserver 0.0.0.0:8000)
in qa4sm.env enter "172.17.0.1" for QA4SM_IP_OR_URL and "8000" for QA4SM_PORT_OR_NONE.
the QA4SM_API_TOKEN is one that you created in your local develop instance.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dags		dags
docker-context-files		docker-context-files
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
_airflow.sh		_airflow.sh
base_requirements.txt		base_requirements.txt
compose.yaml		compose.yaml
docker-run.sh		docker-run.sh
docker-stop.sh		docker-stop.sh
environment.yml		environment.yml
qa4sm.env.template		qa4sm.env.template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qa4sm-airflow

Setup

Airflow Common

Dag containers

Environment variables

Communicating with local development instance

About

Releases

Packages

Languages

awst-austria/qa4sm-airflow

Folders and files

Latest commit

History

Repository files navigation

qa4sm-airflow

Setup

Airflow Common

Dag containers

Environment variables

Communicating with local development instance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages