- A analytical pipeline that provide data driven insights on city traffic.
- Weather condition affecting city traffic.
- Crash/incidents happening on road affecting the city traffic.
- Fatal rate on street due to crash.
- Busiest streets.
- Traffic density on geo map.
https://public.tableau.com/profile/aditya.dubey3253#!/
https://public.tableau.com/profile/aditya.dubey3253#!/vizhome/Trafficanalysisongeomap/Dashboard1
https://public.tableau.com/profile/aditya.dubey3253#!/vizhome/crash_16033013839220/Dashboard2
https://public.tableau.com/profile/aditya.dubey3253#!/vizhome/Traffic_analysis_16032111787210/Dashboard1
Every city in the world is facing problem of traffic congestion. Instead of investing any furthur resources (more bridges, widen roads, etc), If city can manages our existing resources effeciently, city traffic department can avoid the prolem of traffic congestion. So with the help of traffic data- data driven insights on traffic data is providing to city traffic department to solve the problem.
Motivation In the city of Chicago there is an one way reversible express lane(speed limit= 70mph) that opens for traffic towards the city on morning and in evening it opens away from the city to traffic. On Mondays through Fridays, reversible express lanes from the inbound direction to outbound travel between 11:30 a.m. and 1:30 p.m., depending on traffic conditions. Sundays through Fridays, the outbound reversibles are switched to the inbound direction between 11 p.m. and 1 a.m.
This motivated me to provide the data driven insights to city traffic department to use the existing resourses effeciently.
Pipeline Consists of various modules:
1: Amazon S3
2: Spark
3: Amazon Redshift
4: Tableau
1: Data Collected from the API is moved to landing zone s3 buckets.
2: ETL job has s3 module which copies data from landing zone to working zone- Spark
3: Once the data is moved to working zone, spark job is triggered which reads the data from S3 and apply transformation and do the necessary processing.
4: processed data is put back to s3 buckets.
5: ETL jobs picks up data from processed zone and stages it into the Redshift staging tables.
6: Tableau reads data from redshift and shows the dashboards.
- Chicago Traffic Data
- Chicago Crash Report
- Weather Data
- Intersections
Note: To accees this data you need permission from chicago Data Portal.
Spark
Installation configuration:
Spark Version: spark-2.4.7-bin-hadoop2.7.tgz
Java Version: openjdk-8-jre-headless
Python Version: python3.7.9
Spark configure:
$ spark-submit --packages com.amazonaws:aws-java-sdk:<version>,org.apache.hadoop:hadoop-aws:2.7.7 --conf spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true --conf spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true --master local spark_job.py
Setting up Redshift You can follow the AWS Guide to run a Redshift cluster.
Feel free to contact me or You can email me at [email protected]