Neural Style Transfer with PyTorch

In this tutorial, we’ll cover how to implement the neural-style algorithm that’s based on this paper.

What is neural style transfer?

Neural style transfer is a technique used to generate images in the style of another image. The neural-style algorithm takes a content-image as input, a style image, and returns the content image as if it were painted using the artistic style of the style image.

How does the neural style transfer algorithm work?

In order to understand all the mathematics involved in this algorithm, I’d encourage you to read the original paper by Leon A. Gatys et al.

When implementing this algorithm, we define two distances; one for the content(Dc) and one for the style(Ds). Ds measures how different the content is between two images, while Ds measures how different the style is between two images.

We take a third image—the input—and transform it in order to both minimize its content-distance with the content-image and its style-distance with the style-image.

Implementation in PyTorch

In order to implement this algorithm we have to import the following packages:

torch, torch.nn, numpy for implementing the neural network and scientific computation respectively.
torch.optim for implementing various optimization algorithms.
PIL, PIL.Image, matplotlib.pyplot for loading and displaying the images.
torchvision.transforms treats PIL images and transforms them into torch tensors.
torchvision.models for training and loading pre-trained models.
copy to deep copy the models.

from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from PIL import Image
import matplotlib.pyplot as plt

import torchvision.transforms as transforms
import torchvision.models as models

import copy

Using Cuda

If you’re using a computer with a GPU you can run larger networks. Running torch.cuda.is_available() will return true if your computer is GPU-enabled. You’d then have to set torch.device that will be used for this script.

The .to(device) method moves a tensor or module to the desired device. To move this tensor or module back to the CPU, use the .cpu() method.

Loading the images

We kick off by importing the style and content images. We then scale the images to a desired output image size and transform them to torch tensors.

The loaded images must be of the same size. The images used in this tutorial are available on Pixabay and can be downloaded here and here. I also used Gimp to resize them to the same dimensions.

The images are converted to torch tensors, and their values are between 0 and 1. This is important because neural networks are trained with 0–1 image tensors.

imsize = 512 if torch.cuda.is_available() else 128  

loader = transforms.Compose([
    transforms.Resize(imsize), 
    transforms.ToTensor()])  

def image_loader(image_name):
    image = Image.open(image_name)
    image = loader(image).unsqueeze(0)
    return image.to(device, torch.float)


style_img = image_loader("images/apple.jpg")
content_img = image_loader("images/fig.jpg")

assert style_img.size() == content_img.size(), 
    "You have to to import style and content images of the same size"

Displaying the images

We use Matplotlib’s plt.imshow to display the images. This involves a few steps:

Reconvert the images to PIL images.
Clone the tensor so as to not make changes to it.
Remove the fake batch dimension.
Pause so that plots are updated.
Plot the images using imshow.

unloader = transforms.ToPILImage() 
plt.ion()

def imshow(tensor, title=None):
    image = tensor.cpu().clone() 
    image = image.squeeze(0)      
    image = unloader(image)
    plt.imshow(image)
    if title is not None:
        plt.title(title)
    plt.pause(0.001) 

plt.figure()
imshow(style_img, title='Style Image')

plt.figure()
imshow(content_img, title='Content Image')

The content loss function

The content loss is a function that takes as input the feature maps at a layer in a network and returns the weighted content distance between this image and the content image. This function is implemented as a torch module with a constructor that takes the weight and the target content as parameters.

The mean square error between the two sets of feature maps can be computed using a criterion nn.MSELoss and forms the third parameter. Content losses are added at each desired layer as additive modules of the neural network.

This way, each time the network is fed with an input image, all the content losses will be computed at the desired layers. autograd handles the computation of all gradients. For this we make the forward of the module return the input.

The module becomes a transparent layer of the neural network and the computed loss is computed as a parameter of the module.

We then define a fake backward method that calls the backward method of nn.MSELoss in order to reconstruct the gradient. This method returns the computed loss, which will be used when running the gradient descent in order to display the evolution of style and content losses.

class ContentLoss(nn.Module):

    def __init__(self, target,):
        super(ContentLoss, self).__init__()
        self.target = target.detach()

    def forward(self, input):
        self.loss = F.mse_loss(input, self.target)
        return input

Style Loss

For the style loss, we define a module that computes the gram produced, given the feature maps of the neural networks. We then normalize the values of the gram matrix by dividing by the number of elements in each feature map.

def gram_matrix(input):
    a, b, c, d = input.size()  

    features = input.view(a * b, c * d)  # resise F_XL into hat F_XL

    G = torch.mm(features, features.t()) 

    return G.div(a * b * c * d)

The style loss module is implemented exactly the same way as the content loss module; however, it compares the difference in gram matrices of target and input.

class StyleLoss(nn.Module):

    def __init__(self, target_feature):
        super(StyleLoss, self).__init__()
        self.target = gram_matrix(target_feature).detach()

    def forward(self, input):
        G = gram_matrix(input)
        self.loss = F.mse_loss(G, self.target)
        return input

Loading the neural network

Similar to what is described in the paper, we use a pre-trained VGG network with 19 layers (VGG19). The module in PyTorch that allows us to do this is divided into two child sequential layers; the features that contain convolution and pooling layers and a classifier that has the fully connected layers.

cnn = models.vgg19(pretrained=True).features.to(device).eval()

VGG networks are trained on images with each channel normalized by mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]. We use them to normalize the image before sending it to the network.

cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)

class Normalization(nn.Module):
    def __init__(self, mean, std):
        super(Normalization, self).__init__()
        self.mean = torch.tensor(mean).view(-1, 1, 1)
        self.std = torch.tensor(std).view(-1, 1, 1)

    def forward(self, img):
        return (img - self.mean) / self.std

We would like to add our style and content modules as additive transparent layers at desired layers in our network. In order to achieve this, we construct a new Sequential module in which we add modules from vgg19 and the loss modules.

content_layers_default = ['conv_4']
style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']

def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
                               style_img, content_img,
                               content_layers=content_layers_default,
                               style_layers=style_layers_default):
    cnn = copy.deepcopy(cnn)

    normalization = Normalization(normalization_mean, normalization_std).to(device)

    content_losses = []
    style_losses = []

    model = nn.Sequential(normalization)

    i = 0  
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = 'conv_{}'.format(i)
        elif isinstance(layer, nn.ReLU):
            name = 'relu_{}'.format(i)
          
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            name = 'pool_{}'.format(i)
        elif isinstance(layer, nn.BatchNorm2d):
            name = 'bn_{}'.format(i)
        else:
            raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))

        model.add_module(name, layer)

        if name in content_layers:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module("content_loss_{}".format(i), content_loss)
            content_losses.append(content_loss)

        if name in style_layers:
            target_feature = model(style_img).detach()
            style_loss = StyleLoss(target_feature)
            model.add_module("style_loss_{}".format(i), style_loss)
            style_losses.append(style_loss)
            
    for i in range(len(model) - 1, -1, -1):
        if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):
            break

    model = model[:(i + 1)]

    return model, style_losses, content_losses

For this tutorial, we’ll use our content image as our input image. You can use a different image, but it has to have the same dimension as the other images.

input_img = content_img.clone()
plt.figure()
imshow(input_img, title='Input Image')

The author suggests that we use the L-BFGS algorithm to run our gradient descent. We train the input image in order to minimize the content/style losses.

We create a PyTorch L-BFGS optimizer optim.LBFGS and pass the image as the tensor to optimize. We use .requires_grad_() to ensure that the image requires gradient.

def get_input_optimizer(input_img):
    optimizer = optim.LBFGS([input_img.requires_grad_()])
    return optimizer

We must feed the network with the updated input in order to compute the new losses at each step. We run the backward methods of each loss to dynamically compute their gradients and perform gradient descent. The optimizer requires as argument a closure, which is a function that re-evaluates the model and returns the loss.

A small challenge that arises when doing this is that the optimized image may take values between −∞ and +∞ instead of between 0 and 1 as required.

We therefore must perform an optimization under constraints to ensure that we maintain the right values in our input image. We achieve this by correcting the image so its values fall between the 0–1 interval at each step.

def run_style_transfer(cnn, normalization_mean, normalization_std,
                       content_img, style_img, input_img, num_steps=300,
                       style_weight=1000000, content_weight=1):
   
    model, style_losses, content_losses = get_style_model_and_losses(cnn,
        normalization_mean, normalization_std, style_img, content_img)
    optimizer = get_input_optimizer(input_img)

    print('Optimizing..')
    run = [0]
    while run[0] <= num_steps:

        def closure():
            # correct the values of updated input image
            input_img.data.clamp_(0, 1)

            optimizer.zero_grad()
            model(input_img)
            style_score = 0
            content_score = 0

            for sl in style_losses:
                style_score += sl.loss
            for cl in content_losses:
                content_score += cl.loss

            style_score *= style_weight
            content_score *= content_weight

            loss = style_score + content_score
            loss.backward()

            run[0] += 1
            if run[0] % 50 == 0:
                print("run {}:".format(run))
                print('Style Loss : {:4f} Content Loss: {:4f}'.format(
                    style_score.item(), content_score.item()))
                print()

            return style_score + content_score

        optimizer.step(closure)

  
    input_img.data.clamp_(0, 1)

    return input_img

Conclusion

Now let’s proceed to see our newly-generated image that has the artistic style of the style image.

output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,
                            content_img, style_img, input_img)

plt.figure()
imshow(output, title='Output Image')

plt.ioff()
plt.show()

You can use this very same code with different images to try out new artistic designs. However, keep in mind that the neural-style algorithm requires that the image be of the same dimension.

Reference

Advanced Neural Transfer Tutorial