16-726 21spring Assignment #4

Neural Style Transfer

Author: Zhe Huang (zhehuang)


Introduction

In this assignment, we explored the deeping learning task of neural style transfer (NST), one of the coolest task in my opinion. Specifically, we trained our NST pipeline under multiple settings, understanding this transferral process step by step. We examine both style loss which is responsible for immitating the texture of a given style image as well as content loss which is aimed at preserving the original image content when transfering it to a brand new style.In the end, we produce a fine result from our NST pipeline and apply it to some interesting images to show its capability.

Part 0. Experiment settings

Unless stated otherwise, all experiments in this assignment are using:

with all other hyperparameters set to the default provided in starter files.

Part 1.1 The effect of content loss

We show the effect of the content loss by training NST pipeline using content loss only (i.e. only regenerating the original content image from pure noise input). We put content loss after different conv layers to examine its effectiveness of preserving the original image content. Here we report five different results from putting content loss after conv_1, conv_2, conv_3, conv_4 and conv_5 respectively. To give better analysis, we train some of our experiments for both 100 steps and 300 steps. Visualizations are as follows. Our final content losses are also provided as the title of each subplot.

Conclusion: It seems like that the later you put the content loss in the network, the harder it is to optimize it. Thus, we need more training steps to make the loss converge.

Part 1.2 Optimizing random noises for content

Here I choose the picture of fallingwater to be the content images, putting content loss after conv_4 and training it for 300 steps to regenerate the content. Here are the results.

Part 2.1 The effect of style loss

We show the effect of the content loss by training NST pipeline using style loss only (i.e. only immitating the texture style from pure noise input). Here we try two different settings, one is the default setting which is to gather style loss after each conv layer from conv1 to conv5. The other one is to only calculate the style loss after conv5. Results are as follows. Our final style losses are also provided as the title of each subplot.

Conclusion: It looks like that putting style loss after all conv layers (i.e. from conv1 to conv5) really helps with learning the texture of the original image.

Part 2.2 Optimizing random noises for texture

Based on the findin from Part 2.1, I put style loss after all conv layers here and after. I then train two noises for 300 steps to regenerate the texture of picasso. Here are the results. Since we only learn the texture so the content input does not matter here.

Part 3.1 Neural style transfer

Now we apply both losses from Part 1. and Part 2.. I have paied attention to the gram matrix to make it normalized. For hyperparameters, as is described in Part 0., we set both weights as 1. We anchor style losses after each conv layer and we put content loss after conv4. We train for 300 steps when content images are as inputs. We train for 1000 steps when noise images are as inputs. The two contents and styles we choose are:

Part 3.2 NST results

Here we show different results generated by different combinations of content iamges and style images mentioned above. For speed-up, corresponding content images are used as inputs. Due to necessery croppings, content inputs and style inputs may look slightly different across different settings.

Part 3.3 NST from noise inputs

Here we do all 4 trainings in Part 3.1 again. However, at this time we start all trainings from a random noisy input and we train it for both 300 and 1000 steps to see if how well it can run.

Conclusion: Training from noisy input takes longer, as expected, since it needs to learn more to reconstruct the original content. Thus, it cannot finish in 300 steps. However, since the loss is not stable during the training, compared with content image input, the overall quality of NST via noise inputs is not good (or maybe it's because of bad hyperparameter settings).

Part 3.4 NST on self-selected images

Here are two interesting results from self-seleted images and by utilizing the exact NST pipeline from Part 3.2.

The first one is to transfer our CMU's GHC building in to Japanese manga style.

The second one is me pretending to get a protrait of myself from Mr. Da Vinci (in the era of COVID-19 obiviously).

Part 4. Bells & whistles

We give an interesting grump cat image by transferring it to black & white scratch style.