This assignment seeks to explore Style Transfer using pretrained layers of VGG-16 trained on ImageNet.
These are the four reconstruction examples from layers conv_4, conv_8, conv_12, conv_16 from VGG-16 pretrained on ImageNet. We can see that as the layers get deeper, the reconstruction results get worse.
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
---|
![]() |
![]() |
---|
When synthesizing textures from deeper layers, the texture looks less like the actual style image. Hence, using the the shallower layers will make the texture be similar to that of the style image.
![]() |
![]() |
---|
To compute the loss, I used conv_4 for content feature and conv_1-conv_5 for style feature.
The gram matrix was normalized by the size of the input. The default style weight was 100000
and the content loss weight was 1. I ran 300 iterations of optimization, ahd the optimizer was
LBFGS.
The run time for random noise and content image input was similar to each other.
![]() |
![]() |
![]() |
|
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Content Image | Style Image | Noise input result | Content input result |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The frames were taken at 30fps from the input video. Each frame went through a style transfer, and put back into a 30fps video.
Style image | Input Video |
---|---|
![]() |
![]() |