Skip to content

PySpark-based ETL framework facilitating loading of data into diverse sources - including SQL, NoSQL, and Streaming Data

Notifications You must be signed in to change notification settings

nikitperiwal/yet-another-etl

Repository files navigation

yet-another-etl

PySpark-based ETL framework facilitating loading of data into diverse sources - including SQL, NoSQL, and Streaming Data, expediting efficient data lake setup.

Prerequisites

Spark Connections

  1. Clickhouse: An open-source columnar database management system for online analytical processing (OLAP).
  2. Druid: An open-source distributed data store designed for real-time analytics on large datasets.
  3. Kafka: An open-source distributed event streaming platform for building real-time data pipelines and streaming applications.
  4. PostgreSQL: An open-source relational database management system.

You can explore these links to learn more about how Spark can connect and interact with these data sources.

Other

To ensure code quality and formatting, you can use the following commands:

  1. Check your Spark application code for style and PEP8 compliance:
    flake8
  2. Automatically format your code according to Black's rules:
    black .

About

PySpark-based ETL framework facilitating loading of data into diverse sources - including SQL, NoSQL, and Streaming Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published