Implementation of Caffe LeNet-5 on STM32F446RE board with Arm Cortex-M4 core.
- STM32 NUCLEO-F446RE board
- Desktop Computer (GPU is optional)
- Jupyter notebook - https://jupyter.org/
- Python - https://www.python.org/
- Caffe - https://caffe.berkeleyvision.org/
- STM32CubeIDE - https://www.st.com/en/development-tools/stm32cubeide.html
- PuTTY - https://www.putty.org/
Notes: Make sure software above have been installed before proceeding to further step.
- LeNet-5 Model Definition: Model/lenet_train_test.prototxt (for training & testing), Model/lenet_deploy.prototxt (for real classification on desktop)
- Pre-trained LeNet-5 model: Model/lenet_iter_10000.caffemodel
- MNIST Dataset in LMDB format: Dataset/mnist_test_lmdb & Dataset/mnist_train_lmdb (for training & testing purpose).
- MNIST Dataset in jpg format: https://github.com/teavanist/MNIST-JPG (for real classification purpose, please create and locate at Test_Dataset dir).
(Optional) If you don't want to use the pre-trained LeNet-5 model.
<caffe> train -solver Model/lenet_solver.prototxt
(Optional) If you wish to fine-tune the pre-trained LeNet-5 model.
<caffe> train -solver Model/lenet_solver.prototxt -weights Model/lenet_solver.prototxt
Note:
- To enable GPU for full training/fine-tuning, use
-gpu 0
argument. - Remember to change variables in prototxt accordingly if needed, ie: dataset path (lmdb).
- <caffe> is your executable caffe, for my Windows case:
C:\Caffe\caffe-master\Build\x64\Release\caffe.exe
. - More info regarding data preparation and model training, you may refer to https://caffe.berkeleyvision.org/gathered/examples/mnist.html.
- Open Scripts/LeNet5_classification.ipynb via Jupyter Notebook.
- Follow and execute instruction mentioned in the Jupyter Notebook.
- Remember to change the path for following variables:
caffe_root
,root
,model_def
,model_weights
,labels_file
. - You can choose to run inference via CPU/GPU by setting
caffe.set_mode_cpu()
orcaffe.set_mode_gpu()
. - This Jupyter notebook allows you to run image classification for one image and group of test images.
- Accuracy and inference speed will be displayed as below:
- nn_quantizer.py: Needs Caffe model definition (.prototxt) used for training/testing the model that consists of valid paths to datasets (lmdb) and trained model file (.caffemodel). It parses the network graph connectivity, quantize the caffemodel to 8-bit weights/activations layer-by-layer incrementally with minimal loss in accuracy on the test dataset. It dumps the network graph connectivity, quantization parameters into a pickle file.
- Run nn_quantizer.py to parse and quantize the network. This step takes a while if run on CPU as it quantizes the network layer-by-layer while validating the accuracy on test dataset. To enable GPU for quantization sweeps, use
--gpu
argument.
python nn_quantizer.py --model ../Model/lenet_train_test.prototxt --weights ../Model/lenet_iter_10000.caffemodel --save lenet_quantize.pkl
- code_gen.py: Gets the quantization parameters and network graph connectivity from previous step and generates the code consisting of NN function calls. Supported layers: convolution, innerproduct, pooling (max/average) and relu. It generates (a) weights.h (b) parameter.h: consisting of quantization ranges and (c) main.cpp: the network code.
- Run code_gen.py to generate code to run on Arm Cortex-M CPUs.
python code_gen.py --model lenet_quantize.pkl --out_dir ../Code
- convert_image.py: Get a group of MNIST images in jpg format and convert them into signed-int8 format. All the images array will be categorized into different input_x.h files, whereby each input_x.h file contains a maximum of 80 images (due to memory limitation of NUCLEO-F446RE board).
- All the input_x.h files will be included into a include_list.h file, whereby user is allowed to comment / uncomment them such that only one input_x.h is included and uploaded to the board.
python convert_image.py --image_dir ../Test_Dataset --out_dir ../Code
- Create a new project via STM32CubeIDE.
- In Board Selector, select NUCLEO-F446RE for your Commercial Part No.
- Download CMSIS-NN & CMSIS-DSP package from https://github.com/ARM-software/CMSIS_5 and add them to our project.
- Remember to include both DSP/Include and NN/Include dirs via
Project > Properties > C/C++ General > Paths and Symbols > Includes
. - Add NN/Source dir via
Project > Properties > C/C++ General > Paths and Symbols > Source Location
. - Click your project ioc, under Pinout & Configuration, expand Timers, select TIM10, and click 'Activated' to activate the timer.
- Copy content from main.cpp into Core/Src/main.c, and move weights.h, parameter.h, input_x.h, and include_list.h generated into Core/Inc dir.
- 'Build' and 'Run' the project to upload the program to NUCLEO-F446RE board.
- The memory utilization is shown below:
- To view the output message, open PuTTY terminal, click 'Serial', enter your Serial Line (ie: COM3) and Speed (ie: 115200), and click 'Open'.
- Message such as classification result, inference cycle, accuracy will be displayed via PuTTY terminal.
- The final STM32CubeIDE project for LeNet-5 implemenation has been compressed as LeNet-5-Project.zip.
- You are expected to be able to run the project directly to your board to carry out image classification on MNIST image array located in input_x.h.