Interesting layer skip architecture thing.

Apparently destroyed ImageNet 2015 and became very very famous as such.

Bibliography:

pub.aimind.so/a-brief-introduction-to-resnets-d43ae4f1e2a0

ResNet implementation

 0  0

torchvision ResNet
MLperf v2.1 ResNet contains a pre-trained ResNet ONNX at zenodo.org/record/4735647/files/resnet50_v1.onnx for its inference benchmark. We've tested it at: Run MLperf v2.1 ResNet on Imagenette.

ResNet variant

 0  0

ResNet v1 vs v1.5

 0  0

catalog.ngc.nvidia.com/orgs/nvidia/resources/resnet_50_v1_5_for_pytorch explains:

The difference between v1 and v1.5 is that, in the bottleneck blocks which requires downsampling, v1 has stride = 2 in the first 1x1 convolution, whereas v1.5 has stride = 2 in the 3x3 convolution.
This difference makes ResNet50 v1.5 slightly more accurate (~0.5% top1) than v1, but comes with a small performance drawback (~5% imgs/sec).

Convolutional neural network (CNN)

 1  0

CNN convolution kernels are also learnt

 0  0

CNN convolution kernels are not hardcoded. They are learnt and optimized via backpropagation. You just specify their size! Example in PyTorch you'd do just:

nn.Conv2d(1, 6, kernel_size=(5, 5))

as used for example at: activatedgeek/LeNet-5.

This can also be inferred from: stackoverflow.com/questions/55594969/how-to-visualise-filters-in-a-cnn-with-pytorch where we see that the kernels are not perfectly regular as you'd expected from something hand coded.

List of convolutional neural networks

 0  0

LeNet (1998, LeNet-5)

 0  0

LeNet implementation

 0  0

activatedgeek/LeNet-5

 0  0

github.com/activatedgeek/LeNet-5

This repository contains a very clean minimal PyTorch implementation of LeNet-5 for MNIST.

It trains the LeNet-5 neural network on the MNIST dataset from scratch, and afterwards you can give it newly hand-written digits 0 to 9 and it will hopefully recognize the digit for you.

Ciro Santilli created a small fork of this repo at lenet adding better automation for:

extracting MNIST images as PNG
ONNX CLI inference taking any image files as input
a Python tkinter GUI that lets you draw and see inference live
running on GPU

Install on Ubuntu 24.10 with:

sudo apt install protobuf-compiler
git clone https://github.com/activatedgeek/LeNet-5
cd LeNet-5
git checkout 95b55a838f9d90536fd3b303cede12cf8b5da47f
virtualenv -p python3 .venv
. .venv/bin/activate
pip install \
  Pillow==6.2.0 \
  numpy==1.24.2 \
  onnx==1.13.1 \
  torch==2.0.0 \
  torchvision==0.15.1 \
  visdom==0.2.4 \
;

We use our own pip install because their requirements.txt uses >= instead of == making it random if things will work or not.

On Ubuntu 22.10 it was instead:

pip install
  Pillow==6.2.0 \
  numpy==1.26.4 \
  onnx==1.17.0 torch==2.6.0 \
  torchvision==0.21.0 \
  visdom==0.2.4 \
;

Then run with:

python run.py

This script:

does a fixed 15 epochs on the training data
it then uses the trained net from memory to check accuracy with the test data
then it also produces a lenet.onnx ONNX file which contains the trained network, nice!

It throws a billion exceptions because we didn't start the Visdom server, but everything works nevertheless, we just don't get a visualization of the training.

The terminal outputs lines such as:

Train - Epoch 1, Batch: 0, Loss: 2.311587
Train - Epoch 1, Batch: 10, Loss: 2.067062
Train - Epoch 1, Batch: 20, Loss: 0.959845
...
Train - Epoch 1, Batch: 230, Loss: 0.071796
Test Avg. Loss: 0.000112, Accuracy: 0.967500
...
Train - Epoch 15, Batch: 230, Loss: 0.010040
Test Avg. Loss: 0.000038, Accuracy: 0.989300

And the runtime on Ubuntu 22.10, P51 was:

real    2m10.262s
user    11m9.771s
sys     0m26.368s

One of the benefits of the ONNX output is that we can nicely visualize the neural network on Netron:

Figure 1.
Netron visualization of the activatedgeek/LeNet-5 ONNX output
. From this we can see the bifurcation on the computational graph as done in the code at:
`output = self.c1(img) x = self.c2_1(output) output = self.c2_2(output) output += x output = self.c3(output)`
This doesn't seem to conform to the original LeNet-5 however?

activatedgeek/LeNet-5 use ONNX for inference

 0  0

Now let's try and use the trained ONNX file for inference on some manually drawn images on GIMP:

Figure 1.
Number 9 drawn with mouse on GIMP by Ciro Santilli (2023)

Note that:

the images must be drawn with white on black. If you use black on white, it the accuracy becomes terrible. This is a good very example of brittleness in AI systems!
images must be converted to 32x32 for lenet.onnx, as that is what training was done on. The training step converted the 28x28 images to 32x32 as the first thing it does before training even starts

We can try the code adapted from thenewstack.io/tutorial-using-a-pre-trained-onnx-model-for-inferencing/ at lenet/infer.py:

cd lenet
cp ~/git/LeNet-5/lenet.onnx .
wget -O 9.png https://raw.githubusercontent.com/cirosantilli/media/master/Digit_9_hand_drawn_by_Ciro_Santilli_on_GIMP_with_mouse_white_on_black.png
./infer.py 9.png

and it works pretty well! The program outputs:

as desired.

We can also try with images directly from Extract MNIST images.

infer_mnist.py lenet.onnx mnist_png/out/testing/1/*.png

and the accuracy is great as expected.

activatedgeek/LeNet-5 run on GPU

 0  0

By default, the setup runs on CPU only, not GPU, as could be seen by running htop. But by the magic of PyTorch, modifying the program to run on the GPU is trivial:

cat << EOF | patch
diff --git a/run.py b/run.py
index 104d363..20072d1 100644
--- a/run.py
+++ b/run.py
@@ -24,7 +24,8 @@ data_test = MNIST('./data/mnist',
 data_train_loader = DataLoader(data_train, batch_size=256, shuffle=True, num_workers=8)
 data_test_loader = DataLoader(data_test, batch_size=1024, num_workers=8)

-net = LeNet5()
+device = 'cuda'
+net = LeNet5().to(device)
 criterion = nn.CrossEntropyLoss()
 optimizer = optim.Adam(net.parameters(), lr=2e-3)

@@ -43,6 +44,8 @@ def train(epoch):
     net.train()
     loss_list, batch_list = [], []
     for i, (images, labels) in enumerate(data_train_loader):
+        labels = labels.to(device)
+        images = images.to(device)
         optimizer.zero_grad()

         output = net(images)
@@ -71,6 +74,8 @@ def test():
     total_correct = 0
     avg_loss = 0.0
     for i, (images, labels) in enumerate(data_test_loader):
+        labels = labels.to(device)
+        images = images.to(device)
         output = net(images)
         avg_loss += criterion(output, labels).sum()
         pred = output.detach().max(1)[1]
@@ -84,7 +89,7 @@ def train_and_test(epoch):
     train(epoch)
     test()

-    dummy_input = torch.randn(1, 1, 32, 32, requires_grad=True)
+    dummy_input = torch.randn(1, 1, 32, 32, requires_grad=True).to(device)
     torch.onnx.export(net, dummy_input, "lenet.onnx")

     onnx_model = onnx.load("lenet.onnx")
EOF

and leads to a faster runtime, with less user as now we are spending more time on the GPU than CPU:

real    1m27.829s
user    4m37.266s
sys     0m27.562s

lenet

 0  0

This is a small fork of activatedgeek/LeNet-5 by Ciro Santilli adding better integration and automation for:

extracting MNIST images as PNG
ONNX CLI inference taking any image files as input
a Python tkinter GUI that lets you draw and see inference live
running on GPU

Install on Ubuntu 24.10:

sudo apt install protobuf-compiler
cd lenet
virtualenv -p python3 .venv
. .venv/bin/activate
pip install -r requirements-python-3-12.txt

Download and extract MNIST train, test accuracy, and generate the ONNX lenet.onnx:

./train.py

Extract MNIST images as PNG:

./extract_pngs.py

Infer some individual images using the ONNX:

./infer.py data/MNIST/png/test/0/*.png

Draw on a GUI and see live inference using the ONNX:

./draw.py

TODO: the following are missing for this to work:

start a background task. This we know how to do: stackoverflow.com/questions/1198262/tkinter-locks-python-when-an-icon-is-loaded-and-tk-mainloop-is-in-a-thread/79502287#79502287
get bytes from the canvas: all methods are ugly: stackoverflow.com/questions/9886274/how-can-i-convert-canvas-content-to-an-image

AlexNet (2012-)

 0  0

Became notable for performing extremely well on ImageNet starting in 2012.

It is also notable for being one of the first to make successful use of GPU training rather than GPU training.

You Only Look Once (2015-)

 0  0

Object detection model.

You can get some really sweet pre-trained versions of this, typically trained on the COCO dataset.

cirosantilli/ann-architecture

 View article source

 Discussion (0)

New discussion

There are no discussions about this article yet.

 Articles by others on the same topic (0)

There are currently no matching articles.

  See all articles in the same topic Create my own version

ANN model (ANN architecture)

Residual neural network (2015, ResNet)

ResNet implementation

ResNet variant

ResNet v1 vs v1.5

Convolutional neural network (CNN)

CNN convolution kernels are also learnt

List of convolutional neural networks

LeNet (1998, LeNet-5)

LeNet implementation

activatedgeek/LeNet-5

activatedgeek/LeNet-5 use ONNX for inference

activatedgeek/LeNet-5 run on GPU

lenet

AlexNet (2012-)

You Only Look Once (2015-)

RetinaNet (2017-)

 Ancestors (8)

 Incoming links (2)

 Synonyms (1)

 Discussion (0)

 Articles by others on the same topic (0)

 Discussion (0)  Subscribe (1)

 Discussion (0)