The goal of the assignment is to implement neural style transfer which resembles specific content in a certain artistic style. The algorithm takes in a content image, a style image, and another input image. The input image is optimized to match the previous two target images in content and style distance space.
To look at what layer would constitute the content and style the best, 2 ablation studies are performed for with different layers assumed to carry style and content information.
The algorithm takes in a content image, a style image, and another input image.
The network consists of a pretrained vgg network conv features. The style and content losses are computed at different layers between the feature maps and the input image.
The input image is optimized to match the previous two target images in content and style distance space.
Performing content reconstruction by computing losses at different layers and visualizing which loss assumption causes the reconstruction the best.
Conv1 | Conv 2 |
---|---|
Conv 3 | Conv 4 |
I took conv2 as my content layer as the reconstructed result looked the best there. We can observe that the reconstruction is worse for the deeper layers as when we go deeper, the features are more abstract and hence the reconstruction is harder to perform.
For style layers too, the losses are computed at different layers and the input image is optimized for the style loss alone.
Conv 1-2-3-4 | Conv 1-2-3-4-5 |
---|---|
Conv 4-5-6 | Conv 5-6-7 |
I took conv1-2-3-4-5 as the style layers as the generated texture
Initializing the image via the content image gave great results compared to the one when it is initialized with the white noise.
Input Image
First experiments were carried out to look for the correct style loss weight factor for random initialization of input image, where as 1e5 for initialization with content image worked reasonably well and was fixed at that.
Testing for the style weight factor for white noise initialized input
Style Weight as 1500 | Style Weight as 1800 | Style Weight as 2000 | Style Weight as 2500 | Style Weight as 3500 |
---|
This made me choose style loss weight factor as 3.5e3 for white noise initialized transfer.
The content generated image looks really nice
This comparison just shows how important is the initialization of the input image as the optimization is on non-convex surface and can lead to different minima points. The style loss weight is set to be 3.5e3 and content loss weight is 1e5.
|
[Gatys15b] Gatys, Leon A., et al. 2015. A Neural Algorithm of Artistic Style. arXiv preprint (2015), [https://arxiv.org/abs/1508.06576]
[Gatys15a] Gatys, Leon A., et al. 2015. Texture Synthesis Using Convolutional Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS '15), https://arxiv.org/abs/1505.07376