from PIL import Image
import matplotlib.pyplot as plt
I tried the content loss from conv1 to conv5. I found that as I go deeper in the conv layer, the reconstructed image will lose some detail and have more noise. The visualizations are shown below.
im1 = Image.open('results/conv1_content.jpg')
im2 = Image.open('results/conv2_content.jpg')
im3 = Image.open('results/conv3_content.jpg')
im4 = Image.open('results/conv4_content.jpg')
im5 = Image.open('results/conv5_content.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(2,3,1)
plt.title('conv1 content')
plt.imshow(im1)
plt.subplot(2,3,2)
plt.title('conv2 content')
plt.imshow(im2)
plt.subplot(2,3,3)
plt.title('conv3 content')
plt.imshow(im3)
plt.subplot(2,3,4)
plt.title('conv4 content')
plt.imshow(im4)
plt.subplot(2,3,5)
plt.title('conv5 content')
plt.imshow(im5)
The visualization of the output of two random noise input are shown below. I chose the conv4 layer to optimize the content loss. Comparing the output of two random noise input, we can see that the their optimization results look pretty similar. However, they both are not as sharp as the original content image.
im1 = Image.open('results/conv4_content_input1.jpg')
im2 = Image.open('results/conv4_content_input2.jpg')
im3 = Image.open('results/content_img.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(1,3,1)
plt.title('conv4 content input1')
plt.imshow(im1)
plt.subplot(1,3,2)
plt.title('conv4 content input2')
plt.imshow(im2)
plt.subplot(1,3,3)
plt.title('original content image')
plt.imshow(im3)
I tried the style loss from conv1 to conv5 as well as using all conv1-conv5 features. I found that as I go deeper in the conv layer, the generated style becomes less blurry and contains more details. When optimizing on all conv1-conv5 layers, the output contians the color from earlier layers while also maintains decent texture details from later layers. The visualizations are shown below.
im1 = Image.open('results/conv1_style.jpg')
im2 = Image.open('results/conv2_style.jpg')
im3 = Image.open('results/conv3_style.jpg')
im4 = Image.open('results/conv4_style.jpg')
im5 = Image.open('results/conv5_style.jpg')
im6 = Image.open('results/style_all.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(2,3,1)
plt.title('conv1 style')
plt.imshow(im1)
plt.subplot(2,3,2)
plt.title('conv2 style')
plt.imshow(im2)
plt.subplot(2,3,3)
plt.title('conv3 style')
plt.imshow(im3)
plt.subplot(2,3,4)
plt.title('conv4 style')
plt.imshow(im4)
plt.subplot(2,3,5)
plt.title('conv5 style')
plt.imshow(im5)
plt.subplot(2,3,6)
plt.title('all conv style')
plt.imshow(im6)
The visualization of the output of two random noise input are shown below. I used all conv1-conv5 layer to optimize the style loss. Comparing the output of two random noise input, we can see that although the layout of textures are different, but the styles captured are quite similar. Comparing to the original style image, I think the optimization outputs are able to capture the style in the image quite well.
im1 = Image.open('results/style_all_input1.jpg')
im2 = Image.open('results/style_all_input2.jpg')
im3 = Image.open('results/style_img.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(1,3,1)
plt.title('all conv style input1')
plt.imshow(im1)
plt.subplot(1,3,2)
plt.title('all conv style input2')
plt.imshow(im2)
plt.subplot(1,3,3)
plt.title('original style image')
plt.imshow(im3)
I used the VGG netowrk pretraiend on ImageNet as the feature extractor. I chose the conv4 layer for content loss optimization and all the conv1-conv5 layers for the style loss optimization. The weight for the content loss is 1 and the weight for the style loss is 100000. I used LBFGS optimizer and optimzed the loss for 300 steps.
im1 = Image.open('results/content_img.jpg')
im2 = Image.open('results/dance_content_img.jpg')
im3 = Image.open('results/style_img.jpg')
im4 = Image.open('results/night_style_img.jpg')
im5 = Image.open('results/dance_pica.jpg')
im6 = Image.open('results/dance_starry.jpg')
im7 = Image.open('results/water_pica.jpg')
im8 = Image.open('results/water_starry.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(3,3,2)
plt.title('origin content')
plt.imshow(im1)
plt.subplot(3,3,3)
plt.title('origin content')
plt.imshow(im2)
plt.subplot(3,3,4)
plt.title('origin style')
plt.imshow(im3)
plt.subplot(3,3,5)
# plt.title('conv4 style')
plt.imshow(im7)
plt.subplot(3,3,6)
# plt.title('conv5 style')
plt.imshow(im5)
plt.subplot(3,3,7)
plt.title('origin style')
plt.imshow(im4)
plt.subplot(3,3,8)
# plt.title('all conv style')
plt.imshow(im8)
plt.subplot(3,3,9)
# plt.title('all conv style')
plt.imshow(im6)
Below are the visualizations of style transfer output using random noise input and content image input. The running time of both setup are approximately the same, about 14.9 seconds. We can see that the output of using random noise as input is more blended into the style, while the output of using content image as input more preserves the color of the person in the original content image.
im1 = Image.open('results/random_input_transfer.jpg')
im2 = Image.open('results/content_input_transfer.jpg')
im3 = Image.open('results/dance_content_img.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(1,3,1)
plt.title('random noise input')
plt.imshow(im1)
plt.subplot(1,3,2)
plt.title('content image input')
plt.imshow(im2)
plt.subplot(1,3,3)
plt.title('original content image')
plt.imshow(im3)
im1 = Image.open('results/my_img1_content_img.jpg')
im1 = im1.rotate(-90)
im2 = Image.open('results/my_img1_starry.jpg')
im2 = im2.rotate(-90)
im3 = Image.open('results/my_img2_content_img.jpg')
im4 = Image.open('results/my_img2_starry.jpg')
plt.figure(figsize=(20,10))
plt.axis('off')
plt.subplot(2,2,1)
plt.title('my content image 1')
plt.imshow(im1)
plt.subplot(2,2,2)
plt.title('my content image 2')
plt.imshow(im3)
plt.subplot(2,2,3)
plt.title('transferred image 1')
plt.imshow(im2)
plt.subplot(2,2,4)
plt.title('transferred image 2')
plt.imshow(im4)