Vision, image and machine learning (partie AM)
Partie (II) Style transfer between images #
This section is a practical application of deep learning frameworks and pre-trained networks. We use their ability to optimize (SGD) the pixels of an image by basing the loss function on the latent space given by a pre-trained network (VGG19).
This tutorial aims to implement with PyTorch the transfer of style from one image to another following a paper by Gatys etal presented at CVPR 2016: Image Style Transfer Using Convolutional Neural Networks. The deep learning CNN network is used as a tool to produce relevant descriptors. This subject has many assets for a TP in images: use of a pre-trained network as a tool, use of the DL/PyTorch framework for optimization, compact code, reasonable computation time (especially ca) and “fun” visual results.


The program starts with 3 functions to load and convert an image:
- load_image to resize and normalise with VGG19 mean/standard deviation; im_convert to convert a Tensor into a Numpy image;
- imshow to view an image output from im_convert.
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.optim as optim
from torchvision import transforms, models
def load_image(img_path, max_size=400, shape=None):
''' Load in and transform an image, making sure the image is <= 400 pixels in the x-y dims.'''
image = Image.open(img_path).convert('RGB')
# large images will slow down processing
if max(image.size) > max_size:
size = max_size
else:
size = max(image.size)
if shape is not None:
size = shape
in_transform = transforms.Compose([
transforms.Resize(size),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406),
(0.229, 0.224, 0.225))])
# discard the transparent, alpha channel (that's the :3) and add the batch dimension
image = in_transform(image)[:3,:,:].unsqueeze(0)
return image
# helper function for un-normalizing an image and converting it from a Tensor image to a NumPy image for display
def im_convert(tensor):
image = tensor.to("cpu").clone().detach()
image = image.numpy().squeeze()
image = image.transpose(1,2,0)
image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))
image = image.clip(0, 1)
return image
def imshow(img): # Pour afficher une image
plt.figure(1)
plt.imshow(img)
plt.show()
if __name__ == '__main__':
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#device = torch.device("cpu")
print(device)
########################## DISPLAY IMAGE#########################################################""
content = load_image('images/mer.jpg').to(device)
style = load_image('images/peinture1.jpg', shape=content.shape[-2:]).to(device)
imshow(im_convert(content))
imshow(im_convert(style))
We reuse a VGG network that has already been trained. VGG is a network that combines convolutions in order to be efficient for image recognition (ImageNet Challenge). When we optimise the style transfer, we no longer want to optimise the layers of the VGG network. This is done by setting the parameter gradient requirement to False. You can therefore load the network with PyTorch like this, neutralise the layers and display all the layers like this:
vgg = models.vgg19(pretrained=True).features
# freeze all VGG parameters since we're only optimizing the target image
for param in vgg.parameters():
param.requires_grad_(False)
features = list(vgg)[:23]
for i,layer in enumerate(features):
print(i," ",layer)
To recover the intermediate characteristics of an image passing through a VGG network, you can do it like this:
### Run an image forward through a model and get the features for a set of layers. 'model' is supposed to be vgg19
def get_features(image, model, layers=None):
if layers is None:
layers = {'0': 'conv0',
'5': 'conv5',
'10': 'conv10',
'19': 'conv19', ## content representation
}
features = {}
x = image
# model._modules is a dictionary holding each module in the model
for name, layer in model._modules.items():
x = layer(x)
if name in layers:
features[layers[name]] = x
return features
We create the target image, which is a copy of the content image and whose pixels will be optimised:
target = content.clone().requires_grad_(True).to(device)
You need to write the function gram_matrix which calculates the Gram matrix from a tensor. You can look at the documentation for the function torch.mm which multiplies two matrices, and the function torch.transpose. The torch.Tensor.view function can be used to change the “view”, for example from a 2D tensor to a 1D tensor, or from a 3D to a 2D tensor, etc.
def gram_matrix(tensor):
# tensor: Nfeatures x H x W ==> M = Nfeatures x Npixels with Npixel=HxW
...
return gram
Write the cost calculation for the content. You can use the features extracted from the ‘conv19’ layer which, according to the article, correspond globally to the content. Note that the layer names do not correspond to the article.
Write down the cost calculation for the style. It will be calculated in the same way, but you will iterate over the features of the other layers. Test this by trial and error (or look at the article).
The total cost (the one that will be optimised) is calculated by taking the weighted average of the style cost and the content cost. Test this by trial and error (or look at the article).
The optimisation part will therefore look like this.
optimizer = optim.Adam([target], lr=0.003)
for i in range(50):
# get the features from your target image
# the content loss
# the style loss
# calculate the *total* loss
# update your target image
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
To find out more about transferring style between images #
A blog that describes the developments in research following Gatys’ approach. Also provides explanations of regularisation approaches, AdaIN.