This repo contains an number of scripts and notebooks trying out things on the Titanic dataset on Kaggle.
A tongue-in-cheek look at removing the gender bias in the dataset, includes:
- Using IBMs AIF360 to evaluate bias in data and models, and trying out reweighing as a method to remove the bias.
- An object-orientated approach to preprocessing using sklearn Pipelines, with custom transformers.
See also: https://www.kaggle.com/garethjns/titanicsexism-fairness-in-ml
A fork of this kernel. Attempts to create a reasonably scoring model using the least code possible. A great example of how not to programme.
See also: https://www.kaggle.com/garethjns/shortest-titanic-kernel-0-78468
Examples working with Microsoft's LightGBM
Introduction to pre-processing and preparing the data to use in LightGBM.
See also: https://www.kaggle.com/garethjns/microsoft-lightgbm-0-795
Script to prepare data, grid search best model parameters, fit a (slightly more) robust ensemble on multiple data splits. Can score about 0.822 (top 3%) with a lucky random seed.
See also: https://www.kaggle.com/garethjns/microsoft-lightgbm-with-parameter-tuning-0-822
A very simple and fast script fitting a logistic regression model with almost no preprocessing. Can score in the top 10% with a lucky random seed, and is a good example of why such a small dataset is terrible for model performance evaluation!
See also: https://www.kaggle.com/garethjns/3-seconds-and-3-features-top-10