template taken from HTML5 Webtemplates.co.uk

16726 Project 5

Zhipeng Bao (zbao)

Overview

This project aims to explore the photo editing problem in computer vision. We start from reconstruct an image in the latent space. Then we do the interpolation between two images in the latent space. Finally, we "draw" our own realistic cats with sketches.

How to run the code

1. bash run_1.sh to reproduce the results on part 1

2. bash run_2.sh to reproduce the results on part 2

3. bash run_3.sh to reproduce the results on part 3

4. bash run_4.sh to reproduce the results on the bells & whistles


Part 0: Sample Images

By looking through the code, I found the first step is actually making sure we can produce some images with the pre-trained models. An important function is sample_noise. So I first implemented the sample_noise function and make sure the pre-trained model can work. Here are some results.

Sampled images by vanilla GAN with z latent input.
Sampled images by Style GAN with w latent input.
Sampled images by Style GAN with w+ latent input.

We can see the Style GAN normally generates more visual-realistic images. Compared with vanilla GAN, Style GAN is one of the state-of-the-art image generation models. The well-designed architecture makes the model can generate more diverse and realistic images.

An interesting observation is that compared with w+ space, the images generated with w space is of more diversity. Intuitively, w+ space should contain more information and lead to more diverse images. I am not quite familiar with the training procedure of Style GAN, but I guess the reason may because the original model is trained on w space, so when we give different random inputs in w+ sapce, each layer gains signals from different directions, leading to the averaged generated images. However, if we consider optimizing the whole latent input for this project in the w+ space, it is similar to that we start the optimization with an averaged image so that the performance can be more robust.


Part 1: Inverting the Generator

1.1 Codes

See the submitted codes for details.

1.2 Projection Results

Source Image.

Without perceptual loss

vanilla GAN.
Style GAN with w latent space.
Style GAN with w+ latent space.

with perceptual loss (perceptual weight as 1.0)

vanilla GAN.
Style GAN with w latent space.
Style GAN with w+ latent space.

We have the following observations:

1. with or without peceptual loss, we can still invert the generator and reconstruct the source image. Since this is a simple task, perhaps we don't need the perceptual loss for this simple problem. But if we zoom in the generated images, when we introduced the perceptual loss, (1) the visual quality of vanilla GAN gets better (2) the images generated by Style GAN contain more details.

2. Style GAN can generate better results than vanilla GAN. It is because Style GAN contains more complicated and effective structures and it is one of the SOTA methods for image generation.

3. w and w+ space can both work for StyleGAN, we cannot find an obvious difference between the two latent space. It is also because the simple problem setting of the inversion.


Part 2: Interpolate your Cats

2.1 Codes

See the submitted codes for details.

2.2 Results

We showed the results with or without perceptual loss. For each pair of interpolation, we sample 9 cats as the middle results.

Original Images

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Perceptual weight = 0

Vanilla GAN reconstructions

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Style GAN reconstructions in w space

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Style GAN reconstructions in w+ space

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Vanilla GAN interpolations

Cat 1 to Cat 2.
Cat 3 to Cat 4.

Style GAN interpolations in w space

Cat 1 to Cat 2.
Cat 3 to Cat 4.

Style GAN interpolations in w+ space

Cat 1 to Cat 2.
Cat 3 to Cat 4.

Perceptual weight = 1

Vanilla GAN reconstructions

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Style GAN reconstructions in w space

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Style GAN reconstructions in w+ space

Cat 1.
Cat 2.
Cat 3.
Cat 4.

Vanilla GAN interpolations

Cat 1 to Cat 2.
Cat 3 to Cat 4.

Style GAN interpolations in w space

Cat 1 to Cat 2.
Cat 3 to Cat 4.

Style GAN interpolations in w+ space

Cat 1 to Cat 2.
Cat 3 to Cat 4.

From the listed results, we have the following observations:

1. For the reconstructed images, Style GAN has better results than vanilla GAN. And for the interpolation, StyleGAN also has a smoother result.

2. There is not a big difference for Style GAN with or without perceptual loss and using w or w+ space. When zooming in the image, w+ space contains more details and using perceptual loss leads to a smoother interpolation.


Part 3: Draw your cats

3.1 Codes, implementation details and hyper parameters

See the submitted codes for details.

This part of task is kind of challenging. I adopt several loss functions for it. (1) First, the simple L2 loss (2) For the perceptual loss, I use conv_1 as the target layer. Insipred by the discussion in pizza, I use two perceptual losses: |VGG(mask*source, mask*target)| and |mask*VGG(source),mask*VGG(target)|. The same weight is given to the two terms (perceptual weight) (3)Additional discriminative loss: I found with the exsiting losses, the model still cannot generate realistic images. I think the reason is that we give a too strong constrain for the model. So we add an additional discriminative loss to balance the constrain. We run the full VGG19 model to generate the features and give a L2 loss for the features of source and target image.

The learning rate is set to 0.08 instead of default 0.1. The perceptual weight is set to 2.5, the discriminative loss weight is set to 0.4. For each image, we found we may reproduce different drawing with different initialization. So we randomly run the model several times and picked the best one.

3.2 Results

a. Sketch 1.
Source Image
Vanilla GAN
StyleGAN w space
StyleGAN w+ space
b. Sketch 2.
Source Image
Vanilla GAN
StyleGAN w space
StyleGAN w+ space
c. Sketch 3.
Source Image
Vanilla GAN
StyleGAN w space
StyleGAN w+ space
d. Sketch 4.
Source Image
Vanilla GAN
StyleGAN w space
StyleGAN w+ space
e. Sketch 5.
Source Image
Vanilla GAN
StyleGAN w space
StyleGAN w+ space

During the experiments, we have the following observations:

1. With the same sketch, a larger discriminative loss weight will lead to a more realistic cat image, but it may not look very similar to the sketch. It is because the discriminative loss is designed to balance the constrian in the latent sapce for the model to generate some meaningful images.

2. Sparse inputs and dense inputs need different model parameters. For the current model, it works better for denser images. It we want the model work for the Sparse sketch, we need to have a smaller perceptual weight and smaller discriminative loss weight. From the results, we can also see sketch 2 is kind of sparse and the result is not as good as the others.

3. Vanilla GAN still works worse than StyleGAN. However, this time, we can see the difference between w+ space and w space for StyleGAN. images generated by w+ space is more similar to the source sketch. I think this is because w+ space has a better initialization for the model so that the latent input can converge to the target place faster.


Part 4: Bells and Whistles

4.1 Additional discriminative loss for the sketch drawing

As mentioned in the previous part, I propose an additional discriminative loss to control the drawing.

4.2 Use of style loss

We also tried to add the style loss for problem 3. We use w+ space for the problem and also use conv_1 as the style loss layer.

Some results:
Sketch 1
Drawing 1
Sketch 3
Drawing 3
Sketch 5
Drawing 5

Compared with the original result, when combining the style loss, we can generate images with a similar texture as the sketch.