Image Synthesis 16726 Assignment 4

Style Transfer

Harsh Sharma hsharma2

The goal of the assignment is to implement neural style transfer which resembles specific content in a certain artistic style. The algorithm takes in a content image, a style image, and another input image. The input image is optimized to match the previous two target images in content and style distance space.

Ablation Study

To look at what layer would constitute the content and style the best, 2 ablation studies are performed for with different layers assumed to carry style and content information.
The algorithm takes in a content image, a style image, and another input image.
The network consists of a pretrained vgg network conv features. The style and content losses are computed at different layers between the feature maps and the input image.
The input image is optimized to match the previous two target images in content and style distance space.

Content Ablation and reconstruction

Performing content reconstruction by computing losses at different layers and visualizing which loss assumption causes the reconstruction the best.

Input Image

enter image description here

Reconstructions -

Conv1 enter image description here Conv 2enter image description here
Conv 3enter image description here Conv 4enter image description here

I took conv2 as my content layer as the reconstructed result looked the best there. We can observe that the reconstruction is worse for the deeper layers as when we go deeper, the features are more abstract and hence the reconstruction is harder to perform.

Style ablation study and texture synthesis

For style layers too, the losses are computed at different layers and the input image is optimized for the style loss alone.

Input style image

enter image description here

Synthesized Textures

Conv 1-2-3-4enter image description here Conv 1-2-3-4-5enter image description here
Conv 4-5-6enter image description here Conv 5-6-7enter image description here

I took conv1-2-3-4-5 as the style layers as the generated texture

Input image initialization - Random noise vs Content image

Initializing the image via the content image gave great results compared to the one when it is initialized with the white noise.
Input Image
First experiments were carried out to look for the correct style loss weight factor for random initialization of input image, where as 1e5 for initialization with content image worked reasonably well and was fixed at that.
Testing for the style weight factor for white noise initialized input

Input image

enter image description here enter image description here
Style Weight as 1500 Style Weight as 1800 enter image description here Style Weight as 2000 enter image description here Style Weight as 2500 enter image description here Style Weight as 3500 enter image description here

This made me choose style loss weight factor as 3.5e3 for white noise initialized transfer.

Comparison with Content image initialization

enter image description here enter image description here

The content generated image looks really nice

2 different random initializations

This comparison just shows how important is the initialization of the input image as the optimization is on non-convex surface and can lead to different minima points. The style loss weight is set to be 3.5e3 and content loss weight is 1e5.

Results

|

enter image description here enter image description here enter image description here
enter image description here enter image description here enter image description here
-- enter image description here enter image description here
enter image description here enter image description here enter image description here

Bells and Whistles

Grumpy Cat when you are late with the dinner

enter image description here enter image description here enter image description here

References

[Gatys15b] Gatys, Leon A., et al. 2015. A Neural Algorithm of Artistic Style. arXiv preprint (2015), [https://arxiv.org/abs/1508.06576]

[Gatys15a] Gatys, Leon A., et al. 2015. Texture Synthesis Using Convolutional Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS '15), https://arxiv.org/abs/1505.07376