Assignment #4 - Neural Style Transfer
Tarasha Khurana
Andrew ID: tkhurana
Content Loss
I used phipps.jpeg
to study the effect of using convolution layer blocks for optimizing the content loss. The following images show the reconstructed image on using content loss at the end of the VGG conv layers mentioned in each image's caption.
Ablation
It can be seen that the later layers of VGG are not able to help in the image reconstruction. A perfect reconstruction is obtained from the layers in the first convolution block followed by the second and so on. For the final task of style transfer I chose to optimize for the content loss after the two conv layers in the second block (conv_3
and conv_4
). This was because putting the content loss layers after the first block was resulting in style transfer that had more content than the given style from the style image. It seemed like for style transfer, we need only a close-to-perfect reconstruction and not pixel-accurate.
Results
On running the optimization for fallingwater.jpeg
with two different random initializations, I obtained the following results:
As compared to the original content image, both the reconstructions are pretty accurate and match almost exactly with the content image except they contain some white noise. In the two different runs, this white noise appears at different pixels. This is expected as every run starts with a different initialization and effectively, a different starting point on the loss curve such that every time a different minima is obtained.
Style Loss
I used starry_night.jpg
to study the effect of using convolution layer blocks for optimizing the style loss. The following images show the synthesized texture on using style loss at the end of the VGG conv layers mentioned in each image's caption.
Ablation
As compared to the original style image, I find that none of these combinations look close to the actual style of the image but the outputs from the first two conv blocks look close. So for the final style transfer, I use a combination of layers from the first two blocks (conv_1
through conv_4
).
Results
Using this, I took two random noise images and optimized them to synthesize the texture from the_scream.jpg
. Between both the runs, the textures generated were globally similar but locally different. This was again expected because of the same reason explained for content loss optimization above.
Style Transfer
Implementation Details
I ablated on different sets of convolutional blocks as shown in the report above. For these, I tried tuning the weight of the content loss by adding or reducing the order of the weight by 1 and 2. However, after repeated runs I found the default hyperparameters in the starter code to work the best so I stuck with them. I did not have to do much tuning for the gram matrix. More generally, I tried keeping the same hyperparameters for all pairs of style and content images.
Apart from this, I was accidentally updating the network weights when implementing the gram matrix but was able to fix this by cloning the activations variable into another variable. It was also important for the style and content images to be of the same size so as to concatenate their features for optimization. To this end, I resized the style image to the resolution of the content image.
I did this assignment on my lab's cluster with a GeForce FTX 1080 Ti.
Results
Content v/s Random Initialization
In terms of quality, the optimization which is initialized with the content image results in a better style transfer than the optimization which is initialized with a random noise image. Both take 23s for me. In that latter, more style is visible than the content which to me, suggests that the loss for the content transfer should be tuned to weigh slightly higher than the style loss. Doing this should alleviate the issue and make the style transfer from a noise image also good-quality.