It trains the LeNet-5 neural network on the MNIST dataset from scratch, and afterwards you can give it newly hand-written digits 0 to 9 and it will hopefully recognize the digit for you.
Ciro Santilli created a small fork of this repo at lenet adding better automation for:
- extracting MNIST images as PNG
- ONNX CLI inference taking any image files as input
- a Python
tkinter
GUI that lets you draw and see inference live - running on GPU
Install on Ubuntu 24.10 with:We use our own
sudo apt install protobuf-compiler
git clone https://github.com/activatedgeek/LeNet-5
cd LeNet-5
git checkout 95b55a838f9d90536fd3b303cede12cf8b5da47f
virtualenv -p python3 .venv
. .venv/bin/activate
pip install \
Pillow==6.2.0 \
numpy==1.24.2 \
onnx==1.13.1 \
torch==2.0.0 \
torchvision==0.15.1 \
visdom==0.2.4 \
;
pip install
because their requirements.txt uses >=
instead of ==
making it random if things will work or not.On Ubuntu 22.10 it was instead:
pip install
Pillow==6.2.0 \
numpy==1.26.4 \
onnx==1.17.0 torch==2.6.0 \
torchvision==0.21.0 \
visdom==0.2.4 \
;
Then run with:This script:It throws a billion exceptions because we didn't start the Visdom server, but everything works nevertheless, we just don't get a visualization of the training.
python run.py
- does a fixed 15 epochs on the training data
- it then uses the trained net from memory to check accuracy with the test data
- then it also produces a
lenet.onnx
ONNX file which contains the trained network, nice!
The terminal outputs lines such as:
Train - Epoch 1, Batch: 0, Loss: 2.311587
Train - Epoch 1, Batch: 10, Loss: 2.067062
Train - Epoch 1, Batch: 20, Loss: 0.959845
...
Train - Epoch 1, Batch: 230, Loss: 0.071796
Test Avg. Loss: 0.000112, Accuracy: 0.967500
...
Train - Epoch 15, Batch: 230, Loss: 0.010040
Test Avg. Loss: 0.000038, Accuracy: 0.989300
One of the benefits of the ONNX output is that we can nicely visualize the neural network on Netron:
CNN convolution kernels are not hardcoded. They are learnt and optimized via backpropagation. You just specify their size! Example in PyTorch you'd do just:as used for example at: activatedgeek/LeNet-5.
nn.Conv2d(1, 6, kernel_size=(5, 5))
This can also be inferred from: stackoverflow.com/questions/55594969/how-to-visualise-filters-in-a-cnn-with-pytorch where we see that the kernels are not perfectly regular as you'd expected from something hand coded.
This is a small fork of activatedgeek/LeNet-5 by Ciro Santilli adding better integration and automation for:
- extracting MNIST images as PNG
- ONNX CLI inference taking any image files as input
- a Python
tkinter
GUI that lets you draw and see inference live - running on GPU
Install on Ubuntu 24.10:
sudo apt install protobuf-compiler
cd lenet
virtualenv -p python3 .venv
. .venv/bin/activate
pip install -r requirements-python-3-12.txt
Download and extract MNIST train, test accuracy, and generate the ONNX Extract MNIST images as PNG:Infer some individual images using the ONNX:Draw on a GUI and see live inference using the ONNX:TODO: the following are missing for this to work:
lenet.onnx
:./train.py
./extract_pngs.py
./infer.py data/MNIST/png/test/0/*.png
./draw.py
- start a background task. This we know how to do: stackoverflow.com/questions/1198262/tkinter-locks-python-when-an-icon-is-loaded-and-tk-mainloop-is-in-a-thread/79502287#79502287
- get bytes from the canvas: all methods are ugly: stackoverflow.com/questions/9886274/how-can-i-convert-canvas-content-to-an-image
70,000 28x28 grayscale (1 byte per pixel) images of hand-written digits 0-9, i.e. 10 categories. 60k are considered training data, 10k are considered for test data.
This is THE "OG" computer vision dataset.
Playing with it is the de-facto computer vision hello world.
It was on this dataset that Yann LeCun made great progress with the LeNet model. Running LeNet on MNIST has to be the most classic computer vision thing ever. See e.g. activatedgeek/LeNet-5 for a minimal and modern PyTorch educational implementation.
But it is important to note that as of the 2010's, the benchmark had become too easy for many applications. It is perhaps fair to say that the next big dataset revolution of the same importance was with ImageNet.
The dataset could be downloaded from yann.lecun.com/exdb/mnist/ but as of March 2025 it was down and seems to have broken from time to time randomly, so Wayback Machine to the rescue:but doing so is kind of pointless as both files use some crazy single-file custom binary format to store all images and labels. OMG!
wget \
https://web.archive.org/web/20120828222752/http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
https://web.archive.org/web/20120828182504/http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
https://web.archive.org/web/20240323235739/http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
https://web.archive.org/web/20240328174015/http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
OK-ish data explorer: knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=mnist
The most important thing this project provides appears to be the
.onnx
file format, which represents ANN models, pre-trained or not.Deep learning frameworks can then output such
.onnx
files for interchangeability and serialization.Some examples:
- activatedgeek/LeNet-5 produces a trained
.onnx
from PyTorch - MLperf v2.1 ResNet can use
.onnx
as a pre-trained model
The cool thing is that ONNX can then run inference in an uniform manner on a variety of devices without installing the deep learning framework used for. It's a bit like having a kind of portable executable. Neat.