The report presents a comprehensive analysis of car accidents in Canada from 1994 to 2004, focusing on factors influencing the likelihood and severity of accidents. It involves data cleaning, exploratory analysis, feature engineering, and modeling using logistic regression and KNN to determine the most effective traffic surveillance resource allocation. The study also compares the performance of two machine learning algorithms for classification, addressing the dataset’s imbalance with undersampling techniques. Key findings include the periodicity of accidents, variable distributions, and differences between fatal and non-fatal accidents, providing insights into accident prevention and safety measures.
Exploratory Data Analysis, Statistics and Machine Learning
Our research uses a dataset of Canadian Car Accidents from 1994 to 2014, which was constructed by Transport Canada. The dataset is downloaded from Kaggle: Canadian Car Accidents 1994-2014
- Exploratory Data Analysis
- Dimensionality Reduction
- Logistic Regression
- Linear Regression with L1 Regularization
- KNN
R
- Code:
code.Rmd
(R markdown). The code involves data cleaning and filtering, exploratory data analysis, creation of new variables, undersampling of the majority class, modeling with logistic regression and KNN. The final output is the Report file. - Report:
Report_Car_Accidents.pdf
- Presentation:
Presentation_Car_Accidents.pdf