16-726 Homework 4, Jason Xu

Overview

This project aims to perform neural style transfer by building on the VGG-19 convolutional neural network. We try to optimize the loss in content space and style space, then combine the two to enable transferring of style from a style image to a content image.

Implementation

The implementation of the style transfer algorithm is as follows:

Reshape the style and content images to a square, then resize to the same size
Use run_optimization to perform image reconstruction, texture synthesis and style transfer by:
- Disabling style loss for image reconstruction and disabling content loss for texture
- Inputting random noise for synthesis from scratch and the content image for style transferring

For content loss, I used nn.MSELoss to calculate mean squared error for the image and target, after a layer. For style loss, I implemented the Gram matrix by:

Resizing the input to be 2-dimensional
Multiplying the 2-dimensional input matrix by itself
Dividing the matrix with the number of elements present to normalize

Then, I used the MSE loss of the Gram matrix of the input and target as the style loss.

To selectively disable content or style loss, I enclosed the calculation of both losses in conditionals inside the run function. When one of the losses is disabled, its value will be 0 instead.

References

I relied on the tutorial for neural style transfer at https://pytorch.org/tutorials/advanced/neural_style_tutorial.html for my implementation of the network itself.

Content Loss

For content loss, the best results for image reconstruction were seen using conv1 - essentially just pixel loss. A comparison is shown below:

conv1

conv3

conv5

conv7

This does not work well, however, for style transfer, as the criterion is too strict, leading to the optimizer unable to converge and generating images with subpar content quality. For the actual style transfer, I used conv layers 11, 13 and 15, which seems like pure noise when not used with style loss.

'final_content

Another image I managed to reconstruct using content loss:

'final_content

Texture Synthesis

For style loss, I tried placing 5 style layers at 1-5, 6-10, and 11-15 respectively. 1-5 gave the best textures, and are also the ones that work best in the style transfer pipeline. A comparison is shown below:

1-5

6-10

11-15

Another texture I managed to extract, from the Escher sketch, is shown below.

escher

Style Transfer Results

The best hyperparameters that I was able to find in this step was using conv 9, 11, 13 and 15 for content loss and conv 1-5 for style loss. This enabled me to preserve content very well while adding on realistic style.

Results are shown below:

grid

For inputting random noise versus content image, I found that generating images from noise generally returns more abstract results, while generating images from content image returns something more faithful to the original image in detail. Both of them have merits in different art styles.

One of the instances where I preferred noise better was transferring the style of The Scream by Edvard Munch onto a landscape. The two results, shown below, show that the noise-generated image looks more similar to what Munch would have drawn himself, with dramatically distorted geometry.

noise

The opposite goes for other cases, like this style transfer of Frida Kahlo's Self Portrait onto a dancer. In this one, since the artist's style is a lot more similar to real life imagery than the previous case, the image-generated result works much better.

real

I was also able to add my own cat pictures and desired styles to the pipeline.

anime1

anime2

One last thing I wanted to try to do was to turn a picture of a sleeping cat that I took, inspired by chiascuro lighting in Renaissance paintings, into an actual Renaissance piece. To do this, I carefully selected a single subject of Caravaggio's The Calling of St. Matthew that matched the lighting conditions of my photo for the style image, and I edited the original image in Lightroom to lower the exposure and increase the contrast to maximum, to match the dramatic lighting in the painting. Performing style transfer on these two images yields this result:

renaissance

I find this result quite realistic and beautiful, as the painting's brushwork and color can be seen clearly on the final product, with the chiascuro effect particularly visible and realistic thanks to the manual matching I did beforehand, giving it a dramatic 3D effect exceeding even that of the original photo.

I name this computer-generated painting The Calling of St. Catthew.