GitHub - moritzkoerber/covid-19-data-engineering-pipeline: A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

This repo is my playground to try out various data engineering stuff. The used services/tools/design is not always the best choice or sometimes unnecessary cumbersome – this just reflects me trying to explore different things. At the moment, the pipeline processes Covid-19 data as follows: All infrastructure is templated in AWS CloudFormation or AWS CDK. All steps feature an alarm on failure. The stack can be deployed via Github Actions. I use poetry to manage the dependencies/virtual environment.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
ops		ops
src		src
.gitignore		.gitignore
.hadolint.yaml		.hadolint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint		.yamllint
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

License

moritzkoerber/covid-19-data-engineering-pipeline

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages