""" Style Loss ~~

The style loss module is implemented similarly to the content loss module. It will act as a transparent layer in a network that computes the style loss of that layer. In order to calculate the style loss, we need to compute the gram matrix $G_{XL}$. A gram matrix is the result of multiplying a given matrix by its transposed matrix. In this application the given matrix is a reshaped version of the feature maps $F_{XL}$ of a layer $L$. $F_{XL}$ is reshaped to form $\hat{F}_{XL}$, a $K$\ x\ $N$ matrix, where $K$ is the number of feature maps at layer $L$ and $N$ is the length of any vectorized feature map $F_{XL}^k$. For example, the first line of $\hat{F}_{XL}$ corresponds to the first vectorized feature map $F_{XL}^1$.

Finally, the gram matrix must be normalized by dividing each element by the total number of elements in the matrix. This normalization is to counteract the fact that $\hat{F}_{XL}$ matrices with a large $N$ dimension yield larger values in the Gram matrix. These larger values will cause the first layers (before pooling layers) to have a larger impact during the gradient descent. Style features tend to be in the deeper layers of the network so this normalization step is crucial. """

"""A Sequential module contains an ordered list of child modules. For instance, vgg19.features contains a sequence (Conv2d, ReLU, MaxPool2d, Conv2d, ReLU…) aligned in the right order of depth. We need to add our content loss and style loss layers immediately after the convolution layer they are detecting. To do this we must create a new Sequential module that has content loss and style loss modules correctly inserted. """

"""Finally, we must define a function that performs the neural transfer. For each iteration of the networks, it is fed an updated input and computes new losses. We will run the backward methods of each loss module to dynamicaly compute their gradients. The optimizer requires a “closure” function, which reevaluates the module and returns the loss.

We still have one final constraint to address. The network may try to optimize the input with values that exceed the 0 to 1 tensor range for the image. We can address this by correcting the input values to be between 0 to 1 each time the network is run.

"""

Content Loss

Report the effect of optimizing content loss at different layers. [15 points]

The layer we were directed to use, conv_4 seems to offer a good reconstruction. Later layers are hard to optimize for reconstruction. I believe that the later the layer, the less convex the optimization problem, allowing for many local minima (or even "infinite" local minima). Interestingly, conv_16 is able to take a few steps in its optimization, before blowing up.

Choose your favorite one (specify it on the website). Take two random noises as two input images, optimize them only with content loss. Please include your results on the website and compare each other with the content image. [15 points]

In order to highlight the differences due to initialization more, I chose to include an intermediate result with fewer iterations. However, with different initializations, they seem to go through a very similar path to the final result

Style Loss

Report the effect of optimizing texture loss at different layers. Use one of the configurations; specify it in the website and: [15 points]

It is hard to compare different texture images. I think that all of them look similar, using different combinations of layers. It seems the important thing is using the gram matrix, which is a distribution of features, instead of the optimizing the raw feature maps.

Take two random noises as two input images, optimize them only with style loss. Please include your results on the website and compare these two synthesized textures. [15 points]

In this case, the texture images are clearly different, but seem to be sampled from similar distributions. It is interesting to me that the intermediate results again look similar.

Style Transfer

Tune the hyper-parameters until you are satisfied. Pay special attention to whether your gram matrix is normalized over feature pixels or not. It will result in different hyper-parameters by an order of 4-5. Please briefly describe your implementation details on the website. [10 points]

Modifying the weight can affect the final image in a small way when starting from the content image, but it is interesting that the weight affects result when starting from random much more.

Please report at least a 2x2 grid of results that are optimized from two content images mixing with two style images accordingly. (Remember to also include content and style images therefore the grid is actually 3x3) [10 points]

I really like the dancing image, that always had interesting results. Tubingen seemed to have less interesting texture in my experiments here

Take input as random noise and a content image respectively. Compare their results in terms of quality and running time. [10 points]images/content/

Both starting from the content and starting from random generated interesting very interesting and artistic images in this case. In other tests I found that starting from the content image can converge much more quickly

Try style transfer on some of your favorite images. [10 points]

My image was one I took while working with imaging dummies out in a test site recently. We staged the dummies during down time, to have them interact. In this case, the color is closer to Frida than some other images I tried. It converged, especially when initializing from content, to an interesting painting like image.