Vision, image and machine learning (partie AM)
__
B. Image classification using convolution networks (with Pytorch) #
For this 2nd problem, we have images and we want to recognise the class to which they belong. For example, recognising a number from a picture of the number, or recognising a geometric figure from a drawing, or more generally recognising a family of objects (cat, car, aeroplane, fork, etc.) from a photo.
For this type of task, the appropriate network is ConvNET or CNN: Convolution Neural Network. You can read explanations of what a CNN is in the course or on the internet (for example an intuitive explanation here ).
For code with pytorch, look here.

Data #
For this tutorial, we invite you to use a database of images from an L3 project that seeks to recognise 5 drawn shapes: square, circle, triangle, hourglass, star. There are only a few hundred images per shape, so it’s a good challenge to see how recognition works with very few images. It’s also interesting to increase the data. In the case of images like this one, you can make small random rotations to the images to increase the number.
You can also use others databases:
- MNIST: a database of handwritten numbers
- FashionMNIST
- the EMNIST character database (https://www.nist.gov/itl/iad/image-group/emnist-dataset).
- All the classic image category recognition databases: CIFAR-10 or CIFAR-100
- A little more challenging dataset: QuickDraw: a database of manual drawings to recognise, 300 classes, 73 GB of vector data and $12,000 in rewards …
Code to load the FashionMNIST dataset with pytorch:
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
Display some images
labels_map = {
0: "T-Shirt",
1: "Trouser",
2: "Pullover",
3: "Dress",
4: "Coat",
5: "Sandal",
6: "Shirt",
7: "Sneaker",
8: "Bag",
9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(training_data), size=(1,)).item()
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
An example of code that loads a database of images from your disk. See also the doc from DataLoader :
from torchvision import datasets, transforms
from torch.autograd import Variable
import torchvision
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from torch.utils.data.sampler import SubsetRandomSampler
class MyTransform(object): # Votre propre fonction de transfo d'images utilisée en preprocessing (si besoin)
def __call__(self, x):
y = preprocess(x)
return y
def imshow(img): # Pour afficher une image
plt.figure(1)
img = img / 2.0 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
#plt.imshow(npimg)
plt.show()
transform_img = transforms.Compose([
MyTransform(), # Votre propre fonction de transfo d'images utilisée en preprocessing
transforms.Resize(16),
#transforms.CenterCrop(256),
transforms.ToTensor(),
transforms.Normalize(mean=[0., 0., 0.],
std=[0.5, 0.5, 0.5] )
])
mydata = ImageFolder(root="../data/shapes5_preprocessed", transform=transform_img)
loader = DataLoader(mydata, batch_size=32, shuffle=True, num_workers=1)
The NET #
The code of a network looks like this. There are the part that extracts features and the part that classifies.
class Classifier(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.features = nn.Sequential(
# 3 input image channel, 6 output channels (meaning 6 different convolutions), 5x5 square convolution
nn.Conv2d(3, 6, 5)
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(6, 16, 5)
# ... to do
# ...
)
self.classifier = nn.Sequential(
nn.Linear(16 * 5 * 5, 120),
nn.ReLU(),
nn.Linear(120, 84),
nn.ReLU(),
nn.Linear(84, 10)
)
print(self.features)
print(self.classifier)
def forward(self, input):
x = self.features(input) # CNN
x = x.view(x.size(0), -1) # change the view in order to flatten the tensor
x = self.classifier(x) # Fully connected
return x
Training #
In a similar way to the point cloud classifier above, the network must be trained by also declaring the DataLoader, optimizer, loss, etc.
net = Classifier()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
Conclusion #
It is interesting to see that each convolution layer becomes more and more specific to the object: