I used Machine Learning to make a Logistic Regression model using scikit-learn, pandas, numpy, seaborn and matplotlib to predict the results of FIFA 2018 World Cup.
-
The goal is to use Machine Learning to predict who is going to win the FIFA World Cup 2018.
-
Predict the outcome of individual matches for the entire competition.
-
Run simulation of the next matches i.e quarter finals, semi finals and finals.
These goals present a unique real-world Machine Learning prediction problem and involve solving various Machine Learning tasks: data integration, feature modelling and outcome prediction.
I used two data sets from Kaggle - Results of the matches since 1930 and the World Cup 2018 Dataset. I used results of historical matches since the beginning of the championship (1930) for all participating teams.
- Jupyter Notebook
- Numpy
- Pandas
- Seaborn
- Matplotlib
- Scikit-learn
I chose Logistic Regression in my model and got an accuracy of 57% on the training set and 55% accuracy on the test set. I also used the FIFA ranking as of April 2018 dataset and a dataset containing the fixture of the group stages of the tournament.
-
Dataset - to improve dataset you could use FIFA, the game not the organisation, to assess the quality of each team player.
-
A confusion matrix would be great to analyse which games the model got wrong.
-
We could ensemble that is, we could try stacking more models together to improve the accuracy.