Simple c++ implementation for neural networks with CUDA support.
It was developed with purpose to learn deeply neural networks training process:
- activations: sigmoid, tanh, relu, softmax
- layers: linear, convolutional, max-pool
- backpropagation algorithm
- dropout
- batch normalization
TODO:
- fix dropout using CUDA random generators implementation
- adam optimizer
- cifar-100
cd build
cmake -DUSE_CUDA=ON ..
cmake --build . -- -j 4
bash ./download.sh
Network trains 20 epochs with accuracy 99.0%
Convolutional neural network architecture:
Network<Cuda> net("mnist");
net.AddConv3D(Shape{28,28}, Shape{32,5,5});
net.AddReLu();
net.AddMaxPool(Shape{32,24,24}, Shape{2,2});
net.AddDropout(0.2);
net.AddConv3D(Shape{32,23,23}, Shape{64,5,5});
net.AddReLu();
net.AddMaxPool(Shape{64,19,19}, Shape{2,2});
net.AddDropout(0.2);
net.AddLinearLayer(64*18*18, 512);
net.AddReLu();
net.AddDropout(0.1);
net.AddLinearLayer(512, 10);
net.AddSoftmax();
net.AddCrossEntropy(10);
Convolutional neural network trains 50 epochs with accuracy 61.0%
Network architecture:
Network<Cuda> net("cifar");
net.AddConv3D(Shape{3,32,32}, Shape{16,5,5});
net.AddReLu();
net.AddMaxPool(Shape{16,28,28}, Shape{2,2});
net.AddDropout(0.2);
net.AddConv3D(Shape{16,27,27}, Shape{32,5,5});
net.AddReLu();
net.AddMaxPool(Shape{32,23,23}, Shape{2,2});
net.AddDropout(0.2);
net.AddLinearLayer(32*22*22, 256);
net.AddReLu();
net.AddDropout(0.2);
net.AddLinearLayer(256, 10);
net.AddSoftmax();
net.AddCrossEntropy(10);
Neural network size could be expanded to achive better accuracy, but training speed slows down dramatically.
docker build -t cuda-dev -f cuda-dev.Dockerfile .
Test it:
docker run --gpus all cuda-dev nvidia-smi
Run container:
docker run --gpus all -it --rm -v $PWD:/home/nn -w /home/nn cuda-dev /bin/bash
Build and run in container:
docker run --gpus all -it --entrypoint bash --rm -v $PWD:/home/nn -w /home/nn cuda-dev download_build_and_run.sh
tiny-dnn is a good implementation of deep learning algorithms, but I couldn't get it work with CUDA:
CUDA Neural Network Implementation:
Back Propagation (and Python example):
Matrix multiplication with Cuda:
- https://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/cuda/05-cuda-mm.pdf?__blob=publicationFile
- http://www.ncsa.illinois.edu/People/kindr/projects/hpca/files/NCSA_GPU_tutorial_d3.pdf
- https://www.quantstart.com/articles/Matrix-Matrix-Multiplication-on-the-GPU-with-Nvidia-CUDA/
CUDA Compatibility:
Adam Optimizer:
How to Develop a CNN From Scratch for CIFAR-10 Photo Classification: