Overview

This project aims to explore an interesting computer vision task: style transfer. During this assignment, we first implemented the content loss for retain the object of an image, then we implemented the style loss to synthesize the texture of a target image. Finally, we combined them together to tackle the style transfer task.

How to run the code

1. python run_1.py to reproduce the results on part 1, uncomment the part marked as "part 2" to see the next results;

2. python run_2.py to reproduce the results on part 2, uncomment the part marked as "part 2" to see the next results;

3. python run_3.py to reproduce the results on part 3, Change the image path to reproduce the full results.

Part 1: Content Reconstruction

1.1 Codes

See the submitted codes for details.

1.2 Results with content loss at different layers

we tried to reconstruct the dancing image and set random noise as input. For all the compared results, we set content_weight as 100 and run 100 iterations.

Original Image.

Input Noise.

Content Loss at "Conv_1".

Content Loss at "Conv_2".

Content Loss at "Conv_3".

Content Loss at "Conv_4".

Content Loss at "Conv_5".

We can see, if we add the content loss to an early convolution layer, the output image can be reconstructed perfectly. If we add it to the later layers, the visual results become worse. I think it is because it is easier to refine the noisy image to the target image with a shallow network since the gradient is easier to track.

1.3 Generated Images with different inputs

For this sub problem, I gives three different inputs to the network: a. random noise, b. all write image, c. all black image. The results and analysis are shown as the following. I choose "conv_2" as the content loss layer based on my preference.

Random noisy input.

Reconstructed image with random noise.

All-white input.

Reconstructed image with all-white input.

All-black input.

Reconstructed image with all-black input.

We can see, when we use different inputs, the reconstructed images can almost capture the whole subject of the source image. However, the background becomes a little different with different inputs. The noisy points also look different.

Part 2: Texuture Synthesis

2.1 Codes

See the submitted codes for details.

2.2 Results with style loss at different layers

We tried to synthesize the texture of the starry night image. We use random noise as the input and set style_weight to 100,000 and run 120 iterations.

Target Image.

Synthesized texture with style loss at "conv_1".

Synthesized texture with style loss at "conv_2".

Synthesized texture with style loss at "conv_3".

Synthesized texture with style loss at "conv_4".

Synthesized texture with style loss at "conv_5".

Synthesized texture with style loss at "conv_1", "conv_2", "conv_3".

Synthesized texture with style loss at "conv_3", "conv_4", "conv_5".

Synthesized texture with style loss at "conv_1", "conv_2", "conv_3", "conv_4", "conv_5".

We can see, the texture synthesis is a hard task. Using style loss at a single layer could not work well for this task. But we still have different performances with style loss at different convlution layers. We aslo have a better result if we add the style loss to an early layer. Besides, if we add the style loss to several convolution layers, we can finally obtain the desired texture.

For all the following Experiments, we use style loss at all the convolution layers (Conv_1 ~ Conv_5).

1.3 Synthesized Texture with different inputs

For this sub problem, I gives three different inputs to the network: a. random noise, b. all write image, c. all black image. The results and analysis are shown as the following.

Synthesized texture with random noise.

All-white input.

Synthesized texture with all-white input.

All-black input.

Synthesized texture with all-black input.

We can see, when we use different inputs, the syntheszied texture are almost similar to those of the target image. However, when we take different inputs, we will have different background color and noisy points for the synthesized texture.

Part 3: Style Transfer

3.1 Codes and hyper parameters

See the submitted codes for details.

We set content_weight as 5, style_weight as 1000000, we trained 200 epoches for each pairs of images, and we add content loss to conv_4 and style loss to all the conv blocks.

3.2 Results

In this part, we showed 4 groups of results. (a) Four style transfer results with random inputs; (b) same 4 style transfer results with content image as the input; (c) Style transfer on my favoratie images; (d) Style transfer on the cat images (Bells & Whistles).

a. Style transfer with random input.

b. Style transfer with content image as the input.

Compared with training with random noise, when we use the content image as the input, the visual quality will be better since the combinition of content loss and style loss may be harder to train with noisy images. For the running time, it takes around 4 minutes to refine the image with random noise (1 RTX 2080 Super is used). Then for the content images, it takes around 2.5 minutes to finish the style transfer per image.

In conclusion, when we use the content image as the input and conduct style transfer task based on that, we can achieve better and faster results compared with using random noise as the input. Therefore, for the experiments in the following parts, I also use the content images themselves as the input images.