Table of Contents
Part of a project in Deep Learning Applied AI at the University of Sapienza spring of 2022. The starting point was the paper "Training BatchNorm and only BatchNorm" where they investigated the effects of freezing all but batch normalization layers on residual neural nets. This project experiments with MLP's of varying dimensions on MNIST for comparison (the implementation also works for CIFAR-10). Also implemented a shallow CNN (mainly for seeing effects of a shallow non-residual CNN, but also for seeing effects of tuning on BatchNorm performance).
Findings suggest that BatchNorm does offer greater performance than the same number of random parameters. At least when going over a certain number of parameters, and effects generally increase as parameters increase from then on.
The notebooks are standard tensorflow/keras jupyter notebooks. For understanding more about what they are about I recommend reading the paper on training BatchNorm and only BatchNorm (link in acknowledgments).
The notebooks run with jupyter and tensorflow/keras. They also work fine in google colab.
- Jupyter notebook
- Tensorflow
pip install tensorflow
- Keras-tuner if you want to do tuning. If this becomes an issue feel free to comment it out.
pip install keras-tuner --upgrade
There are two notebooks, one for each architecture (LeNet CNN and MLP's). Each notebook has a tuning section at the bottom where the tuning is commented out. For this project the MLP notebook is the interesting one.
Using MNIST or CIFAR-10 is decided by setting a variable value at the top.
Self-evalution on further work that can be done on this project.
- Optimize the random parameter freezing/unfreezing. Only in Keras for R is it possible to freeze certain weights, this could come soon and easily speed up runtime for the larger nets.
- More rigorous MLP architecture design. As it is the dimensions and contents are somewhat simple and arbitrarily picked based on getting initial results.
- Testing and tuning more hyperparameters. Also activation function positioning (before or after) and batch sizing.
- Further experimenting with other datasets. Also extending it to non-computer vision sets.
- Experiment with other architectures.
See the open issues for a full list of proposed features (and known issues).
Your Name - [email protected]
Project Link: https://github.com/marcusntnu/mlp_lenet_bathnorm