16-726 Assignment 4

Canbo Ye (canboy)

Overview

The goal of this project is to get hands-on experience implementing neural style transfer which resembles specific content in a certain artistic style. In the project, I firstly implement content-space loss and optimize a random noise with respect to the content loss to achieve content reconstrution. Then I implement style-space loss to achieve texture synthesis. In the end, I put these two parts together to achieve style transfer. I alsp try to stylize the poisson blended images from the previous homework.


Part 1: Content Reconstruction

For the first part of this assignment, I implement content-space loss and optimize a random noise with respect to the content loss to achieve content reconstrution.

Effect of optimizing content loss at different layers

I tried to optimize the content loss for fallingwater.png at conv_2, conv_4, conv_8 and conv_12 layer with the same hyperparameters (num_steps=50, content_weight=1). The content reconstrution results are as follows:

Content reconstrution results for fallingwater.png

We could see from the figures above that the reconstructed images using low level layers like conv_2 and conv_4 is nearly perfect and hard to be distinguished from the raw image. However, the generation performance deteriorates greatly when we optimize at higher layer like conv_8 and conv_12. This is because the higher layers of VGG network will capture higher level information but ignore those more detailed ones like the colors or edges, while the lower layers focus mainly on the colors and edges of the images.

Generation results from two random noises

I tried to optimize the content loss for dancing.jpg and tubingen.jpeg at conv_4 layer using two random noises as input images. The random noise inputs and their content reconstrution results are as follows:

Content reconstrution results for dancing.jpg
Content reconstrution results for tubingen.jpeg

We could see from the figures above that the generation results are quite close to the raw content images.


Part 2: Texture Synthesis

For the second part of this assignment, I implement style-space loss to achieve texture synthesis.

Effect of optimizing texture loss at different layers

I tried to optimize the texture loss for the_scream.jpeg at conv_1-4, conv_2468 and conv_6-9 layers with the same hyperparameters (num_steps=50, style_weight=1000000). The raw style image and the texture reconstrution results are as follows:

Texture synthesis results for the_scream.jpeg

We could see from the figures above that the reconstructed images using low level layers like conv_1 to conv_4 is better than the ones with higher layers like conv_6 to conv_9 in terms of the colors, brushworks and stroke thickness. This is also because the higher layers of VGG network will capture higher level information but ignore those more detailed ones like the colors or edges, while the lower layers focus mainly on the colors and edges of the images.

Generation results from two random noises

I tried to optimize the style loss for frida_kahlo.jpeg and starry_night.jpeg at conv_1-4 layers using two random noises as input images. The random noise inputs and their content reconstrution results are as follows:

Texture synthesis results for frida_kahlo.jpeg
Texture synthesis results for starry_night.jpeg

We could see from the figures above that the generation results are quite close to the raw style images.


Part 3: Style Transfer

For the third part of this assignment, it is time to put pieces together.

Hyper-parameters

In my implementation, the gram matrix is normalized over feature pixels and this will affect the style_weight and content_weight greatly. In my case, I set the style_weight to be 1000000 and content_weight to be 1 with num_steps 150. The content loss uses layer conv_4 while the style loss uses layers conv_1 to conv_4.

Grid of results


Take input as random noise and a content image respectively


For the hyperparameters, I set the style_weight to be 1000000 and content_weight to be 1 with num_steps 150. The content loss uses layer conv_4 while the style loss uses layers conv_1 to conv_4. We could see from the figures above that the generation quality between the two approaches are quite similar here. For the content images, it takes 128 seconds to finish the style transfer while the other approach takes 140 seconds, i.e. the former approach is faster but not too much. Besides, we can notice that the loss curve is more stable and dropping faster when using content images as input.

Try style transfer on some of your favorite images.



Bells & Whistles

I tried to stylize the poisson blended images from the previous homework.