With the help of Deep Learning we can capture the content of one image and combine it with the style of another image. This technique of blending these two images is called as Neural Style Transfer. But, you might be wondering how this Neural Style Transfer works? So in this blog post, we are going to look into the underlying mechanism of Neural Style Transfer (NST).
By looking at the above example you might have god fair idea about the Neural Style Transfer. The blend of content from content image and style from style image results into our generated image. It can be seen that the above result i.e generated image cannot be obtained simply by overlapping the images. Now, the question remains, how do we make sure that the generated image has the content of content image and style of style image? How we capture the content and style of respective images and overlay them in the generated image? Does it happen in single step or is it a continuous process?
In order to answer the above questions, let’s look at what Convolutional Neural Networks(CNN) are actually learning.
What Convolutional Neural Network Captures ?
Look at the following image.
Now, at Layer 1 using 32 filters the network may capture simple patterns, say a straight line or a horizontal line which may not make sense to us but is very essential for the network, and slowly as we move down to Layer 2 which has 64 filters, the network starts to capture more and more complex features. So in this deeper section of the model it captures the features like face of a cat. This process of capturing the features by the model is called Feature Representation.
Important thing that we want to make a point here is that CNN’s does not know what the image is, but while training the model they learn to encode what a particular input image represents. This special feature of CNN’s i.e encoding the image content, help us in Neural Style Transfer. So this gives you a brief overview of the link between CNN’s and Neural Style Transfer. Now we are ready to dive in a bit more deeper.
How does Convolutional Neural Networks capture Content and Style of images?
One of the famous model of CNN, VGG19 network is used for Neural Style transfer. VGG-19 is a CNN that is trained on more than a million images from the ImageNet database. As the name suggests it has got 19 layers which are trained on millions of images. That being the reason that it is able to detect high-level features in an image.
Encoding nature of CNN’s plays an important role in Neural Style Transfer.
Let’s dive deeper into the steps involved in Neural Style Transfer
- Initialize a noisy image, which will be our output image(O).
- Measure how similar is this image to the content and style image at a particular layer in the VGG network.
- Our focus is to extract style from style image and content from content image.
- Calculate the loss of output image wrt to content image and style image.
- Our ultimate goal is to minimize this 2 losses – Content Loss and Style Loss.
1. Content Loss
Content loss basically measures how similar is the Generated Noisy Image w.r.t to Content Image. Our ultimate goal is to minimize this loss function.
So in order to calculate content loss:
First we need to take a pre-trained VGG network, In that we choose a Hidden Layer(H) for computing the loss. Let P and F be the original image and the image that is generated respectively. Feature representation of respective image layer is in form of F[l] and P[l].
So the definition of content loss is as follows:
2. Style Loss
Before we understand how style loss is calculated, we first need to understand what exactly do we mean with “style of a image” and how do we capture style of an Style input image.
How do we capture the style of an image ?
This image shows, for a particulate chosen layer(L) what are its feature maps or filters. This forms the foundation for style of an image. Now, if you want to capture the style of image it is important to understand how “correlated” these filters are to each other. How similar are these feature maps ?
Let’s us help understand it with the help of an example:
Similarity between feature map can be understood with help of correlation. If we consider Red and Yellow be the first 2 channels of the image. Let’s say the red channel captures some simple feature (horizontal lines) and if these two channels are correlated. Then it will always happen that whenever a Red channel is detected then there will be a Yellow-ish effect of the next channel.
Now, let’s understand the mathematical for these correlations.
Gram Matrix which is one of the popular method for finding correlation. So to calculate a correlation between different filters, we calculate the dot-product between the vector two filters.
How do we conclude that 2 filters are correlated or not?
Simply saying if the dot product of 2 vectors is large then two channels are said to be correlated and if it is small then the images are uncorrelated.
Putting it mathematically :
Gram Matrix of Style Image(S): Here k and k’ represents different filters of the layer L.
Gram Matrix for Generated Image(G): Here k and k’ represents different filters of the layer L.
With the above mentioned formulas we can now define the style loss.
Loss function between Style Image(S) and Generated Image(G) is the square of difference between the Gram Matrix of the style Image(S) with the Gram Matrix of generated Image(G.
Now we can define the total loss for Neural Style Transfer (NST)
Total Loss Function :
The total loss function is the sum of the loss of the content image and the style image. It can be expressed mathematically as:
You might think from where did these alpha and beta came into picture. They are used for giving appropriate weightage for Content and Style cost respectively. In general,they define the weightage of each cost in the Generated output image.
Once the loss is calculated, then then next step is to minimize this loss using backpropagation, similar to how we do in feed forward neural network. This optimization of our random image will help us generate meaningful piece of art.
This is how neural style transfer works.
So finally the wrap up, In this article we made a deeper dive into how Neural Style Transfer works. In the same time we discussed above the important loss function which acts as a foundation for the Generated Image. Optimizing the 2 loss functions: Style loss and Content Loss in what Neural Style Transfer is all about.
We would love to hear back from you about this article in the comment section. Meanwhile you can also read our article on what is transfer learning?