Overview

We use Neural Style Transfer which renders specific content from an input image in a certain artistic style. This algorithm takes in a content image, style image, and another input image. The input image is optimized to match the previous two target images in content and style distance space. We experiment with randomizing noise and optimizing in content space, then style space, then both to perform neural style transfer. Depending on

Part 1: Content Reconstruction

Content Reconstruction from content image clone

When the input image is the content image, the content loss remains at 0, which is expected because the output is already optimal. Thus, the conv_layer at which the content loss is inserted doesn't matter, and the below two outputs are the same.

Content loss applied at conv 2

Content loss applied at conv 11

Content Reconstruction from Random Noise: Applying at Different Layers

Parameters: [content_weight=10, num_steps= 300]
We note that optimizing content loss in earlier layers enables the model output to retain a more accurate/sharper reconstruction. However, at later layers we observe a few artifacts in the resulting output. This is because as the model performs operation like maxpooling, fine details are lost, so the reconstructions are grainier.

Content loss applied at conv 4

Content loss applied at conv 5

Content loss applied at conv 7

Two Inputs from Noise

Content Image: Dancing
Parameters: [applied at conv5, content_weight=10, num_steps=300]
Original:

Output A:

Output B:

The sythesized contents is the same between the two images. However, the synthesized images have a green-ish tint, and do not have as much contrast as the original image. This might be because the "style" isn't being enforced. so as long as the content (the dancer) is preserved the loss will be low.

Part 2: Texture Synthesis

Texture Synthesis at Different Layers

We now optimize with respect to style loss only, to generate the style of the style image.

Parameters used: [style_weight = 500k, num_steps = 300]
Applied to conv 1 through 5:

Applied to conv 4 through 8:

Applied to conv 7 through 11:

We note that optimizing style loss in layer layers creates a denser, grainier reconstruction, where optimizing in earlier layers yields "larger" patterns. However, in both cases the general color spectrum of the original image is preserved.

Two Inputs from Noise

Style Image: Picasso Parameters: [style_weights: 500K , content_weight: 1, num_steps:300]
Synthesizing from two different noise initializations yields two very similar outputs with the same texture/color scheme as the original Picasso input, but it does not capture the figure represented in the original image. This makes sense since we're only optimizing with respect to style, so the content does not need to be preserved.
Original Image:

Output A:

Output B:

Part 3: Style Transfer

Implementation Details

For style transfer, we must optimize with respect to style loss and content loss. For the model, we inject style/content loss layers after certain conv layers in VGG-19. Then, we add both losses to create a total loss which we backpropagate. We call optimizer.step(), which uses a L-BFGS optimizer that approximates the second derivative (Hessian) for optimization. The style and content weights can be adjusted accordingly depending on how intensely we want to either transfer the style, or retain the original input image content.

2x2 Grid of Results

Style Image 1: Frida Kahlo
Style Image 2: Starry Night
Content Image 1: Wally
Content Image 2: Dancing
Parameters: [style_weights: 100K , content_weight: 1, num_steps:300]

Input as Random Noise, Content Image

Style Image: The Scream
Content Image: Tubingen

Random Noise (Time: 10.31s):

Parameters: [style_weight: 100K, content_weight: 1, num_steps:300]
With input as noise, the output looks great, and is a good mixture between the texture from "The Scream" but also preserves the building's integrity.

Clone (Time: 10.28s):

Parameters: [style_weight=600k, content_weight=100, num_steps:300]
With input as the content image, I had to increase the style weight, otherwise the output is just the content image. It doesn't work as well since the transferred style is not as apparent. However, the time it takes to generate the output is around the same.