Conv1, Loss = 0.000000	Conv2, Loss = 0.000149	Conv3, Loss = 1.231026	Conv4, Loss = 1.080054	Conv5, Loss = 1.035650
Conv6, Loss = 1.181407	Conv7, Loss = 0.818760	Conv8, Loss = 1.189521	Conv9, Loss = 1.240736	Conv10, Loss = 1.211959
Conv11, Loss = 21.018728	Conv12, Loss = 25.126810	Conv13, Loss = 12.542279	Conv14, Loss = 4.814632	Conv15, Loss = 8.259310

Input Image	Content Image	Output Image
Input Image	Content Image	Output Image

Part 2: Texture Synthesis.

1. Report the effect of optimizing texture loss at different layers. Use one of the configurations; [15 points]

The final style loss for conv layers of VGG for frida_kahlo.jpeg as style image. The optimization is done for 300 steps. Using higher conv layers results in losing the rich colors of the texture.

Conv1,2,3,4,5, Loss = 20.946007	Conv1,2,3,4, Loss = 11.017440	Conv1,2,3, Loss = 5.957823
Conv10,11,12,13 Loss = 28.009657	Conv4,6,8,10, Loss = 39.785168	Conv10,12,14, Loss = 39.177685

2. Take two random noises as two input images, optimize them only with style loss. Please include your results on the website and compare these two synthesized textures. [15 points]

Following images optimized only using style loss using conv1, conv2, conv3, conv4, conv5 layers of VGG for 300 steps.

Input Image	Style Image	Output Image
Input Image	Style Image	Output Image

Part 3: Style Transfer.

1. Tune the hyper-parameters until you are satisfied. Pay special attention to whether your gram matrix is normalized over feature pixels or not. It will result in different hyper-parameters by an order of 4-5. Please briefly describe your implementation details on the website. [10 points]

The gram matrix is normalized. The style loss weight is set to 1000000 and the content loss weight to 1. The number of steps is set to 300 steps. All images are resized to 512 x 512. Other details are similar to the style transfer algorithm described.

2. Please report at least a 2x2 grid of results that are optimized from two content images mixing with two style images accordingly. (Remember to also include content and style images therefore the grid is actually 3x3) [10 points]

	Content 1	Content 2
Style 1	Style 1, Content 1	Style 1, Content 2
Style 2	Style 2, Content 1	Style 2, Content 2

3. Take input as random noise and a content image respectively. Compare their results in terms of quality and running time. [10 points]

The quality of output when initialized with the content image is better than when initialized with random noise. This makes sense as the content image is a better initialization for the optimal output (closer to the content image). As of the running time, I used 300 steps for both the cases, they have the same running time. Here are the results,

Content	Style	Input	Output
Content	Style	Input	Output

4. Try style transfer on some of your favorite images. [10 points]

Content	Style	Output
Content	Style	Output
Content	Style	Output

Part 4: Bells and Whistles.

1. Stylizing the grumpy cats from the previous homework [2 points]

Content	Style	Output
Content	Style	Output
Content	Style	Output

2. Temporal smoothing for videos [4 points].

I added L2 loss between stylized frames at t and t+1 for temporally consistent style transfer. This turned out to be computationally very slow, I was only able to do 200 frames at 25 fps. Here is the content and the stylized video (source: Instagram).

Style	Content	Style

Some frames black out due to instablity in computation of style loss (division by zero). Increasing the weight for this loss should help mitigate this issue.