The Formula 1 (F1) racing championship is a highly competitive motorsport that generates vast amounts of data during races, including telemetry data from cars, performance data, driver information, and race results. This project aims to harness the power of data analysis techniques to gain deeper insights into F1 data, unlocking its full potential for improving team performance, optimizing race strategies, and enhancing overall race outcomes. This repository contains data analysis of Formula One (F1) races.The data is accurate as of the 2023 Bahrain Grand Prix. The analysis is performed using Python programming language and various data analysis libraries such as Pandas, Numpy, and Matplotlib.
The data is obtained from Kaggle, which is a platform for data science competitions. The dataset contains information about F1 drivers , seasons competed , race data and points earned.
- F1DriversDataset 2.csv - contains the raw dataset in CSV format
- f1.ipynb - Jupyter notebook used for data analysis
- plots - directory containing plots generated from data analysis
- README.md - this file
The raw dataset contains missing values and inconsistent data types. Data cleaning is performed using Pandas library to remove missing values and convert data types to the appropriate format for data analysis.
The data is analyzed to find insights and answer various questions related to F1 races. The analysis includes correlation analysis, and visualization using Matplotlib library.
the questions explored here are :
- what is the distribution of driver nationalities in this dataset?
- What is the correlation between the number of seasons a driver participates in and their number of race wins?
- What is the correlation between the number of podium finishes and the number of pole positions for drivers who are/have been champions?
- what does it take to be a champion?
The data analysis reveals interesting insights about F1 races, the drivers who compete(d) and championships won. The findings are summarized in the Jupyter notebooks and visualized in the plots directory.
Kaggle dataset: https://www.kaggle.com/datasets/dubradave/formula-1-drivers-dataset
Pandas documentation: https://pandas.pydata.org/docs/
Matplotlib documentation: https://matplotlib.org/stable/contents.html