Good packaging! Tested on Ubuntu 22.10:This throws a billion exceptions because we didn't start the visdom server, but never mind that.
git clone
cd LeNet-5
git checkout 95b55a838f9d90536fd3b303cede12cf8b5da47f
virtualenv -p python3 .venv
. .venv/bin/activate
# Their requirements.txt uses >= and some == are incompatible with our Ubuntu.
pip install
Pillow==6.2.0 \
numpy==1.24.2 \
onnx==1.13.1 \
torch==2.0.0 \
torchvision==0.15.1 \
visdom==0.2.4 \
time python
The scrip does a fixed 15 epochs.
Output on P51:
real 2m10.262s
user 11m9.771s
sys 0m26.368s
The run also produces a
ONNX file, which is pretty neat, and allows us for example to visualize it on Netron:output = self.c1(img)
x = self.c2_1(output)
output = self.c2_2(output)
output += x
output = self.c3(output)
Note that the images must be drawn with white on black. If you use black on white, it the accuracy becomes terrible. This is a good very example of brittleness in AI systems!
Number 9 drawn with mouse on GIMP by Ciro Santilli (2023)
We can try the code adapted from at python/onnx_cheat/ it works pretty well! The protram outputs:as desired.
cd python/onnx_cheat
./ lenet.onnx infer_mnist_9.png
We can also try with images directly from Extract MNIST images.and the accuracy is great as expected.
for f in /home/ciro/git/mnist_png/out/testing/1/*.png; do echo $f; $f ; done
By default, the setup runs on CPU only, not GPU, as could be seen by running htop. But by the magic of PyTorch, modifying the program to run on the GPU is trivial:and leads to a faster runtime, with less
cat << EOF | patch
diff --git a/ b/
index 104d363..20072d1 100644
--- a/
+++ b/
@@ -24,7 +24,8 @@ data_test = MNIST('./data/mnist',
data_train_loader = DataLoader(data_train, batch_size=256, shuffle=True, num_workers=8)
data_test_loader = DataLoader(data_test, batch_size=1024, num_workers=8)
-net = LeNet5()
+device = 'cuda'
+net = LeNet5().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=2e-3)
@@ -43,6 +44,8 @@ def train(epoch):
loss_list, batch_list = [], []
for i, (images, labels) in enumerate(data_train_loader):
+ labels =
+ images =
output = net(images)
@@ -71,6 +74,8 @@ def test():
total_correct = 0
avg_loss = 0.0
for i, (images, labels) in enumerate(data_test_loader):
+ labels =
+ images =
output = net(images)
avg_loss += criterion(output, labels).sum()
pred = output.detach().max(1)[1]
@@ -84,7 +89,7 @@ def train_and_test(epoch):
- dummy_input = torch.randn(1, 1, 32, 32, requires_grad=True)
+ dummy_input = torch.randn(1, 1, 32, 32, requires_grad=True).to(device)
torch.onnx.export(net, dummy_input, "lenet.onnx")
onnx_model = onnx.load("lenet.onnx")
as now we are spending more time on the GPU than CPU:real 1m27.829s
user 4m37.266s
sys 0m27.562s
Interesting layer skip architecture thing.
Apparently destroyed ImageNet 2015 and became very very famous as such. explains:
The difference between v1 and v1.5 is that, in the bottleneck blocks which requires downsampling, v1 has stride = 2 in the first 1x1 convolution, whereas v1.5 has stride = 2 in the 3x3 convolution.This difference makes ResNet50 v1.5 slightly more accurate (~0.5% top1) than v1, but comes with a smallperformance drawback (~5% imgs/sec).
CNN convolution kernels are not hardcoded. They are learnt and optimized via backpropagation. You just specify their size! Example in PyTorch you'd do just:as used for example at: activatedgeek/LeNet-5.
nn.Conv2d(1, 6, kernel_size=(5, 5))
This can also be inferred from: where we see that the kernels are not perfectly regular as you'd expected from something hand coded.
Object detection model.
You can get some really sweet pre-trained versions of this, typically trained on the COCO dataset.
Articles by others on the same topic
There are currently no matching articles.