In this assignment, we implemented neural style transfer which resembles specific content in a certain artistic style.
We followed the instruction and applied our algorithm on tubingen.jpeg. The results are shown below:
Original:
Conv 1:
Conv 2:
Conv 3:
Conv 4:
We can notice that as the conv feature we use goes deeper, the reconstructed images becomes blurry. This is likely due to the fact that as the cnn gets deeper, the features becomes more abstract, and thus the reconstructed image has more freedom in details.
We then provide another reconstructed image using a seperate random initialization, using conv4 feature. Honestly I don't see much difference here, which might mean that the random initialization doesn't matter that much.
We followed the instruction and run our algorithm on starry night (one of my favorite paintings!), as shown below. One thing to notice is that for the history step of optimizor, I used 10 instead of the default 100 since my GPU is not powerful enough. Therefore, the result is inferior than if we run with the default parameter.
Style image:
Using conv 5 for style only:
Using conv 1 - 5 for style:
We can see that using all conv layers for style features really helps with generating better patches than just with the last layer.
We then generate another image using a seperately sampled noise img while using conv 1-5, and get the result as shown below:
We can see that this picture is different from the other one, which means that different initial seeds will converge to different results for the style loss.
I ended up using con3 for the content reconstruction, and conv 1 - 5 for style reconstruction. I used style weight = 1000000 and content weight = 1.
Here I show the results for style: (starry_night, scream) and content: (fallingwater, tubingen).
content and style imgs:
generated results:
Here I show two comparison of initialize with noise versus initialize with content image, where the one on the left is with noise.
We can clearly see that initializing with content image gives the algorithm a much better starting point than with random noise, and therefore converges faster and produce better results.
I tried transfering Monet's painting style to a real photo, and get the following result:
Imagine an angry grump cat getting even more angry... (I really like this one lol)