Monty Balboa ETL Project (Montague Street Bridge)

Made possible by the data from the TeeWallz/monty_balboa project

Project Description

This is an example ETL project in its simplest form. It is designed to demonstrate a common task that a Data Engineer might be required to perform.

It accomplishes the following using Python:

Download the data as JSON from the website How Many Days Since Montague Street Bridge Has Been Hit?
Load the data into a Pandas DataFrame
Normalize the semi-structured data into rows and columns
Export the data as a CSV file

How to Run

If you're unfamiliar with Python, some of these steps might make no sense. If you're on a Windows machine, you'll have to use PowerShell to enter these commands.

You'll need to have both git and python installed on your machine. If you don't have them, you can download them here:

git
python (make sure to check the box that says "Add Python to PATH")

Follow these steps to run the script:

Clone the repository
Install the dependencies
Run the script

# Clone the repository and cd into it
git clone https://www.github.com/danlsn/monty-balboa-etl.git
cd monty-balboa-etl
# Install the dependencies
pip install -r requirements.txt
# Run the script
python pipeline.py

Packages Used

Pandas: the gold standard for data manipulation in Python
Requests: for making HTTP requests
JSON: for parsing JSON data
Datetime: for converting the date string into a datetime object
pathlib: for working with file paths
tqdm: totally unnecessary, but it makes the process look cool

Takeaways

Data Engineering is fundamentally about moving data from one place to another. This project is simple but it captures the basic steps in the approach to solving a data engineering problem. Other datasets you'll encounter in the wild will be more complex, your source data might be different, and you might output to a different place, but the core steps will be the same.

Go out there and build something cool!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
.gitignore		.gitignore
README.md		README.md
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monty Balboa ETL Project (Montague Street Bridge)

Project Description

How to Run

Follow these steps to run the script:

Packages Used

Takeaways

About

Releases

Packages

Contributors 2

Languages

danlsn/monty-balboa-etl

Folders and files

Latest commit

History

Repository files navigation

Monty Balboa ETL Project (Montague Street Bridge)

Project Description

How to Run

Follow these steps to run the script:

Packages Used

Takeaways

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages