litingw's project 4

This project focuses on implementing neural style transfer, an algorithm that generates images blending the content of one image with the artistic style of another. The process involves optimizing a third input image so that it simultaneously resembles the content of a target image and the style of another.

The assignment has 4 parts: 1. Content Reconstruction 2. Texture Synthesis 3. Style Transfer 4. Bells & Whistles (Extra Points)

Part 1: Content Reconstruction

* Optimize content loss at different layers:

Conv_3

Conv_5

Conv_8

Conv_11

My preferred choice is optimizing the content loss at 'conv_3', as it is less sensitive to minor variations in low-level inputs, resulting in a more stable optimization process and a consistently decreasing content loss. Moreover, compared to deeper layers, it better preserves fine details and yields superior reconstruction quality.

* At 'conv_3', given different random noise:

At this convolution layer, the reconstruction remains stable, showing only subtle variations across different noise inputs.

Content Image

Noise 1

Noise 2

Part 2: Texture Synthesis

* Optimize texture loss at different layers:

Layers 1–5 retain large color blocks in the image, but fail to capture fine stylistic details. Layers 5–10 results in a more appropriate texture details, but loses a significant amount of low-level information. Using layers 1–10 balances both and performs well; adding more (up to 15) doesn't help much.

Layers 1–5

Layers 5–10

Layers 1–10

Layers 1–15

* Given different random noise:

Under the same style but different random initializations, the local details and positions of the textures vary, while the overall distribution and stylistic appearance remain similar.

Layers 1–5

Layers 5–10

Layers 1–10

Layers 1–15

Layers 1–5

Layers 5–10

Layers 1–10

Layers 1–15

Part 3. Style Transfer

* Implementation Details:

Take the content image as input, I use: num_steps=300, style_weight=1e6, content_weight=1, torch.manual_seed(92) .

style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5', 'conv_6', 'conv_7', 'conv_8', 'conv_9', 'conv_10']

style_weight=1e1, content_weight=1

style_weight=1e3, content_weight=1

style_weight=1e6, content_weight=1

style_weight=1e9, content_weight=1

Results:

Texture 1

Texture 2

Content 1

1-1

1-2

Content 2

2-1

2-2

* Take input as random noise and a content image respectively:

With the same hyper-parameters, using the content image as input produces good reconstruction by step 300, showing both content and style clearly.

Content image input. step=300

When the input is noise, only a small amount of content appears at step 300, but as the steps increase to 1000 or 3000, more content is gradually recovered. Around step 3000, the result improves, though it still doesn’t match the quality of using the content image as input. Plus, it takes ~10 times longer.

Noise input. step=300

Noise input. step=1000

Noise input. step=3000

Noise input. step=6000

Furthermore, for noise input, setting style_weight = 1e6 and content_weight = 100 yields decent results by step 1000—but still slightly worse than with content image input.

Noise input. step=1000, style_weight = 1e6 and content_weight = 100

* Style transfer on some of custom images:

Content

Texture

Result

Content

Texture

Result

Bells & Whistles (Extra Points)

1. Stylize grump cats: (2pts)

Content

Texture

Result

Content

Texture

Result

2. Apply style transfer to a video, also with temporal smoothness: (4pts)

Apply style transfer to each video frame, while initializing each frame with a blend of the current content and the previous stylized frame. Additionally, enforce a temporal consistency loss (MSE) to smooth transitions between frames.

16-726 Assignment 4: Neural Style Transfer

-Liting Wen (litingw)-

Part 0. Overview

Part 1: Content Reconstruction

* Optimize content loss at different layers:

* At 'conv_3', given different random noise:

Part 2: Texture Synthesis

* Optimize texture loss at different layers:

* Given different random noise:

Part 3. Style Transfer

* Implementation Details:

Results:

* Take input as random noise and a content image respectively:

* Style transfer on some of custom images:

Bells & Whistles (Extra Points)

1. Stylize grump cats: (2pts)

2. Apply style transfer to a video, also with temporal smoothness: (4pts)