Data Integration Pipelines for NYC Payroll Data Analytics

Project Overview

The City of New York would like to develop a Data Analytics platform on Azure Synapse Analytics to accomplish two primary objectives:

Analyze how the City's financial resources are allocated and how much of the City's budget is being devoted to overtime.
Make the data available to the interested public to show how the City’s budget is being spent on salary and overtime pay for all municipal employees.

The source data resides in Azure Data Lake and needs to be processed in a NYC data warehouse. The source datasets consist of CSV files with Employee master data and monthly payroll data entered by various City agencies.

NYC Payroll DB Schema.

In this project, I will use Azure Data Factory to create Data views in Azure SQL DB from the source data files in DataLake Gen2. Then I build the dataflows and pipelines to create payroll aggregated data to be exported to a target directory in DataLake Gen2 storage over which Synapse Analytics external table is built. At a high level, my pipeline will look like below:

High level Pipeline Overview.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1. Prepare the Data Infrastructure		1. Prepare the Data Infrastructure
2. Create Linked Services		2. Create Linked Services
3. Create Datasets in Azure Data Factory		3. Create Datasets in Azure Data Factory
4. Create Data Flows		4. Create Data Flows
5. Data Aggregation and Parameterization		5. Data Aggregation and Parameterization
6. Pipeline Creation		6. Pipeline Creation
7. Trigger and Monitor Pipeline		7. Trigger and Monitor Pipeline
8. Verify Pipeline run artifacts		8. Verify Pipeline run artifacts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Integration Pipelines for NYC Payroll Data Analytics

Project Overview

About

Releases

Packages

Languages

qanhnn12/Data-Integration-Pipelines-for-NYC-Payroll-Data-Analytics

Folders and files

Latest commit

History

Repository files navigation

Data Integration Pipelines for NYC Payroll Data Analytics

Project Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages