This project focuses on analyzing real-life player data and FIFA ratings using data analytics techniques. The main goals include:
- Converting provided Excel files into .csv files for easier data analysis.
- Using scatter plots to identify potential issues in the data.
- Exploring both supervised and unsupervised learning methods.
- Applying supervised learning models to our labeled dataset.
- Evaluating the performance of different models, specifically decision trees and random forests.
- Assessing the accuracy of the models in matching FIFA data with real-life player data.
data/
: Contains the original Excel files and the converted .csv files.Data.xlsx
modified_data.csv
modified_df_DEF.csv
modified_df_MID.csv
modified_df_OFF.csv
modified_df_data.csv
notebooks/
: Jupyter notebooks used for data analysis and model training.gab_randomForest copy.ipynb
plots/
: Generated scatter plots and other visualizations.Plot_DEF_Clearances/
Plot_Goals.py
Plot_Man_of_the_match.py
Plot_Rating.py
scripts/
: Python scripts for data analysis and model implementation.Analysis_DATA_DT.py
Analysis_DATA_RF.py
Analysis_DEF_DT.py
Analysis_DEF_RF.py
Analysis_MID_DT.py
Analysis_MID_RF.py
Analysis_OFF_DT.py
Analysis_OFF_RF.py
DA_DecisionTree_OFF.py
DA_FIFA.py
DA_SplittingData.py
documentation/
: Project documentation.doc_anish_linda/
docu_Gab/
README.md
: Project overview and instructions.
Ensure you have the following installed:
- Python 3.8+
- Jupyter Notebook
- Required Python libraries (listed in
requirements.txt
)
-
Clone the repository:
git clone https://github.com/yourusername/FIFA_Data_Analytics.git cd FIFA_Data_Analytics
-
Install the required libraries:
pip install -r requirements.txt
- Convert the provided Excel files into .csv files:
- Use the script
scripts/DA_SplittingData.py
to automate this process.
- Use the script
- Use scatter plots to visualize the data:
- Notebooks in the
notebooks/
directory demonstrate how to create scatter plots to identify issues within the data.
- Notebooks in the
-
Decision Trees:
- Implemented but did not yield satisfactory results.
- Refer to
scripts/Analysis_DATA_DT.py
andscripts/DA_DecisionTree_OFF.py
for details.
-
Random Forests:
- Provided better results with a higher R-squared value of 0.8.
- Refer to
notebooks/gab_randomForest copy.ipynb
andscripts/Analysis_DATA_RF.py
for the analysis and results.
- For each position (defense, offense, midfield), different features like man of the match or passes per game were identified as having more importance when comparing with real-life player statistics.
- These features significantly influence the model's ability to match FIFA ratings with real-life data.
- Our analysis indicated that the models achieved up to 30% accuracy in matching FIFA ratings with real-life player data.
- We therefore realized that a decision tree is not the best option for our cause.
- The random forest gave us a significantly higher R^2, especially when we focused on the important attributes of the respective tables.
- Anish Biswas
- Deema Aassy
- I-En Hung
- Linda Blumenthal
- Marco Schneider
Despite extensive analysis and the application of various supervised learning models, we found that the accuracy of matching FIFA data with real-life player data was relatively low. Future work could explore additional models, feature engineering, or data augmentation techniques to improve performance.
For any questions or contributions, please open an issue or contact one of the team members.
We hope this project serves as a useful resource for understanding the application of data analytics and machine learning in the context of sports data. Thank you for your interest!