**Neural Style Transfer**
Student name: Abhishek Pavani
(#) Introduction
In this assignment, I implemented Neural Style Transfer to synthesize new images with the content of one image and the style of another.
For Bells and Whistles section, I ran the style transfer on images generated in previous assignment and used an off the shelf codebase for running style transfer on 3D meshes.
(#) Introduction
Neural Style Transfer (NST) is an optimization technique used to take two images, a content image and a style reference image (such as an artwork by a famous painter), and blend them together so the output image looks like the content image, but “painted” in the style of the style reference image.
(#) Part 1: Content Reconstruction
For the first part of the assignment, you will implement content-space loss and optimize a random noise with respect to the content loss only.
To start the process, the content-space loss must be implemented. This type of loss calculates the distance between two images at specific layers in the VGG feature network. The loss is computed by adding up the mean-square error loss from each layer used in the network. It's worth noting that relying solely on l2 pixel loss from original images not processed through the network would not be effective because content loss is in the feature space.
I experimented with optimizing content loss at different layers in the network.
This included using different single layers and multiple layers. As can be seen from the results below, using earlier layers (Conv2 and Conv4) produces images that are brighter and more similar to the original image.
Using deeper layers, the output images still represent the content images, but the texture is noticeably different. Combining these deep layers with the earlier layers can still generate semantic information, but the texture is still different from the original image.
| Original Image | Noise Image | Conv2 layer | Conv 4 | Conv 8 | Optimized Conv 2,4,8|
|----------------|-------------|-------------|--------|--------|---------------------|
|||||||
|||||||
|||||||
|||||||
(#) Part 2: Texture Synthesis
For this part of the assignment, I implemented style-space loss and optimized a random noise with respect to the style loss only.
Style loss: I used the Gram matrix for style measurement. Gram matrix is the correlation of two vectors on every dimension. Specifically, we denote the k-th dimension of the Lth-layer feature of an image as fLk
in the shape of (N,K,H*W)
Then the gram matrix is $G=fk(fk)T$
in the shape of (N, K, K). The idea is that two of the gram matrix of our optimized and predicted feature should be as close as possible.
| Original Image | Noise Image | Conv5 layer | Conv 1-5 | Optimized All |
|----------------|-------------|-------------|--------|--------|---------------------|
||||||
||||||
||||||
||||||
(#) Part 3: Style Transfer
For this part of the assignment, I implemented style transfer by optimizing a random noise with respect to both the content and style loss.
Both content and style have their respective weight factors which tell you how much you want to emphasize the content and style. The section contains my experiments with different weights for the content and style loss and the results I obtained.
(#) Experiments:
(##) Hyperparameters
I experimented with different weights for the content and style loss. I found that the content loss should be much higher than the style loss. This is because the content loss is in the feature space, while the style loss is in the gram matrix space. The content loss is more important than the style loss.
Upon analyzing the results, we found that when optimizing a content input image, the style loss had to be 4-5 orders of magnitude larger than the content loss. However, when optimizing from a noise input, the optimal style loss had to be smaller, which largely depended on the number of style layers. This intuitively makes sense because when we are no longer initializing from the content image, a style loss that is too large would bias the optimizer more towards the style image.
Additionally, having more style layers would result in more style loss.
The optimal weights that we found were a content_weight of 1 and a style_weight of 100000 when optimizing from a content image input, a content_weight of 1 and a style_weight of 50000 when optimizing from a noise input, and a content_weight of 1 and a style_weight of 10000 when optimizing from a noise input with more than 5 style layers.
(###) Results from noise initialization
| Content Image | Style Image | Style Transfer |
|---------------|-------------|----------------|
||||
||||
||||
||||
(###) Results from content initialization
| Content Image | Style Image | Style Transfer |
|---------------|-------------|----------------|
||||
||||
||||
||||
The quality of the images generated from content is much more superior than the ones generated from noise but they take a fraction longer to run compared to the ones generated from noise.
(##) Style Transfer with more style layers
(##) Best Results
In this section, I show a gif of style transfer at differnt steps of the optimization process. This gives us an understanding of how the model is learning the style and content of the image.
| Content Image | Style Image | Style Transfer |
|---------------|-------------|----------------|
||||
||||
||||
(#) Bells and Whistles
(##) Images from previous assignments
I used images from my previous assignment for the content and chose one of the exisiting styles. The results are shown below.
| Content Image | Style Image | Style Transfer |
|---------------|-------------|----------------|
||||
||||
(##) Style Transfer on 3D meshes
In this section I tried using an online available codebase to transfer an image texture to a mesh. The results are shown below. I find it fascinating that the texture is transferred to the mesh and is an interesting application of style transfer.
The code is available [here](https://colab.research.google.com/github/tensorflow/lucid/blob/master/notebooks/differentiable-parameterizations/style_transfer_3d.ipynb#scrollTo=GVN7tg7Gtb_F).
| Content Mesh | Style Image | Style Transfer |
|---------------|-------------|----------------|
||||
||||