Optimization based style transfer

I implement the following:

  1. Content loss based on MSE
  2. Style loss based on gram matrix MSE
  3. L-BFGS based training loop

I present results for the following:

  1. Exploration of different conv layers and the effect on content learning
  2. Optimization of two noise vectors using content loss
  3. Exploration of different conv layers and the effect on style learning
  4. Optimization of two noise vectors using style loss
  5. Tune hyper-parameters to find optimal results
  6. Perform a 2x2 grid of content and style transfers
  7. Take random noise and content as input, and perform optimization
  8. Style transfer using 2 of my favorite images (a blue heeler puppy as content, and abstract art as style)

Hyperparamters

We use vgg-19 without normalization layers, with style loss applied at conv_1 to conv_5 inclusive, and content loss applied at conv_4. We use the LBFGS optimizer with lr=1

Effect of content loss at different layers

We find that enforcing the content loss closer to the input results in a more faithful reconstruction of the input. The image suffers from decoloration as we apply the content loss closer to the output.

Content loss based reconstruction on conv_6

We apply content loss at conv_6 on the falling water image.

Content loss based reconstruction on conv_6

We apply content loss at conv_6 on the phipps image.

Effect of style loss at different layers

We find that enforcing the style loss closer to the input results more high frequency transfer of style. The image has a better style if we apply the loss at a later layer.

Style transfer with only style loss

Above, we show noise transformed with only style loss

Style mixing

We mix the content and style across a 2x2 grid of images. We initialize the input to be the content image.

Different initializations with same loss

We compare the different initializations, while applying the same content and style losses in two different experiements. We find no difference in run time, but the optimization result with the content based initialization results in much better results.

Own results

I blend two of my images, the content which is a puppy and the style being abstract art.