yet-another-etl

PySpark-based ETL framework facilitating loading of data into diverse sources - including SQL, NoSQL, and Streaming Data, expediting efficient data lake setup.

Prerequisites

Spark Connections

Clickhouse: An open-source columnar database management system for online analytical processing (OLAP).
Druid: An open-source distributed data store designed for real-time analytics on large datasets.
Kafka: An open-source distributed event streaming platform for building real-time data pipelines and streaming applications.
PostgreSQL: An open-source relational database management system.

You can explore these links to learn more about how Spark can connect and interact with these data sources.

Other

To ensure code quality and formatting, you can use the following commands:

Check your Spark application code for style and PEP8 compliance:
```
flake8
```
Automatically format your code according to Black's rules:
```
black .
```

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/input		data/input
jars		jars
pipelines		pipelines
scripts		scripts
src		src
Dockerfile_python		Dockerfile_python
Dockerfile_spark		Dockerfile_spark
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yet-another-etl

Prerequisites

Spark Connections

Other

About

Releases

Packages

Languages

nikitperiwal/yet-another-etl

Folders and files

Latest commit

History

Repository files navigation

yet-another-etl

Prerequisites

Spark Connections

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages