16-726: Assignment #4 - Neural Style Transfer
Rawal Khirodkar
Part 1: Content Reconstruction.

1. Report the effect of optimizing content loss at different layers. [15 points]
The final content loss for conv layers of VGG for fallingwater.png as content image. The optimization is done for 300 steps.
The image generation is near perfect when using low level layers like conv1, conv2 and deteriorates as we use higher layers for the content loss. This is because the higher layer of VGG are trained for performing the task of classification and captures higher level representations of the image which may not capture the content in the form of edges.

Conv1, Loss = 0.000000

Conv2, Loss = 0.000149

Conv3, Loss = 1.231026

Conv4, Loss = 1.080054

Conv5, Loss = 1.035650

Conv6, Loss = 1.181407

Conv7, Loss = 0.818760

Conv8, Loss = 1.189521

Conv9, Loss = 1.240736

Conv10, Loss = 1.211959

Conv11, Loss = 21.018728

Conv12, Loss = 25.126810

Conv13, Loss = 12.542279

Conv14, Loss = 4.814632

Conv15, Loss = 8.259310

2. Take two random noises as two input images, optimize them only with content loss. Please include your results on the website and compare each other with the content image. [15 points]
Following images optimized only using content loss using conv4 layer of VGG for 300 steps.

Input Image

Content Image

Output Image

Input Image

Content Image

Output Image

Part 2: Texture Synthesis.

1. Report the effect of optimizing texture loss at different layers. Use one of the configurations; [15 points]
The final style loss for conv layers of VGG for frida_kahlo.jpeg as style image. The optimization is done for 300 steps. Using higher conv layers results in losing the rich colors of the texture.

Conv1,2,3,4,5, Loss = 20.946007

Conv1,2,3,4, Loss = 11.017440

Conv1,2,3, Loss = 5.957823

Conv10,11,12,13 Loss = 28.009657

Conv4,6,8,10, Loss = 39.785168

Conv10,12,14, Loss = 39.177685

2. Take two random noises as two input images, optimize them only with style loss. Please include your results on the website and compare these two synthesized textures. [15 points]
Following images optimized only using style loss using conv1, conv2, conv3, conv4, conv5 layers of VGG for 300 steps.

Input Image

Style Image

Output Image

Input Image

Style Image

Output Image

Part 3: Style Transfer.

1. Tune the hyper-parameters until you are satisfied. Pay special attention to whether your gram matrix is normalized over feature pixels or not. It will result in different hyper-parameters by an order of 4-5. Please briefly describe your implementation details on the website. [10 points]
The gram matrix is normalized. The style loss weight is set to 1000000 and the content loss weight to 1. The number of steps is set to 300 steps. All images are resized to 512 x 512. Other details are similar to the style transfer algorithm described.

2. Please report at least a 2x2 grid of results that are optimized from two content images mixing with two style images accordingly. (Remember to also include content and style images therefore the grid is actually 3x3) [10 points]

Content 1

Content 2

Style 1

Style 1, Content 1

Style 1, Content 2

Style 2

Style 2, Content 1

Style 2, Content 2

3. Take input as random noise and a content image respectively. Compare their results in terms of quality and running time. [10 points]
The quality of output when initialized with the content image is better than when initialized with random noise. This makes sense as the content image is a better initialization for the optimal output (closer to the content image). As of the running time, I used 300 steps for both the cases, they have the same running time. Here are the results,

Content

Style

Input

Output

Content

Style

Input

Output

4. Try style transfer on some of your favorite images. [10 points]

Content

Style

Output

Content

Style

Output

Content

Style

Output

Part 4: Bells and Whistles.

1. Stylizing the grumpy cats from the previous homework [2 points]

Content

Style

Output

Content

Style

Output

Content

Style

Output

2. Temporal smoothing for videos [4 points].
I added L2 loss between stylized frames at t and t+1 for temporally consistent style transfer. This turned out to be computationally very slow, I was only able to do 200 frames at 25 fps. Here is the content and the stylized video (source: Instagram).

Style

Content

Style
Some frames black out due to instablity in computation of style loss (division by zero). Increasing the weight for this loss should help mitigate this issue.