16726 Project 5 - Zihang Lai

GAN Photo Editing

Project Overview

In this project, we aim to find the best latent vector to generate images that satisfy specific constraints. First, we find the best latent vector to reconstruct the original image (inverting the generator). Second, we find the best latent vector to generate images that resembles sketches (Scribble to Image). We also try interpolate between latent vectors to generate smooth transition animation between two natural images.

Part 1: Inverting the Generator

Combinations of the losses

Our final loss $L$ is defined as $L=L_{l1}+\lambda L_{perc} $. Four samples from inverting GAN results using $\lambda=\{0,0.1,0.5,1\}$ as percetual weights as shown below. When perceptual loss is not enabled, the reconstruction looks rather blurry. When the percetual loss is too large, the reconstructed image also becomes blurry and does not exactly align with the input. Therefore, we use $\lambda =0.1$ as the weight of perceptual loss.

Different generative models

The images generated from style GAN looks visually more realistic, suggesting that the StyleGAN generator is better at generating images. Moreover, the image generated from vanilla GAN is not very similar to the input image. This suggests that the vanilla GAN generator did not capture the full distribution of the training images.

Different latent space (latent code, w space, and w+ space)

The images generated from three latent spaces all look relatively realistic, suggesting that the StyleGAN generator is good at generating images. However, optimizing z space seems to yield slightly different color. Comparing to optimizing W space, optimizing W+ space seems to produce sharper and results more similar to the input.

Hyperparameteres and running speed

Overall, from the results shown earlier, we choose $\lambda =0.1$ as perceptual weight, Style GAN generator as the model, and W+ space as the latent space. The average running time is 0.54s per iteration. We optimize for 10k iterations for part 1 (taking 1.5 hrs) and 1k iterations for part 2 and 3 (taking 9 mins)

Part 2: Interpolate your Cats

Interpolation results

Result 1: The transition between the two cats are relatively natural. Note how the eyes open gradually. However, the background shows some unnatural artefacts and noises during transition.

Out[48]:

Result 2: This animation shows a close cat changes to a further away cat. The interpolation is smooth. Note the transition of head pose is very natural, with head rotating to the right and ears showing up. Even the whiskers of the cat changes continuously.

Out[53]:

Result 3: This is a failure case. The eyes suddenly become very large at start, which does not resemble either the first image, the second image, or their intermediate images. One potential reason is that the first image is not well generated (see image below). Therefore, the path starting from an imperfect code is also showing unnatural artefacts.

Out[50]:
Out[51]:

Part 3: Scribble to Image

Implementation details

In this part, we optimize for a best latent code to generate images that resembles hand sketches, rather than to reconstruct a natural image. If the image resembles an input scribble, the loss would be optimized. In our implementation, an extra sparsity map is multiplied on the input scribble so that the output image only match the sketch on ~200 pixels (rather than >2000 pixels). We find this makes the generated image more realistic.

Scribble to Image results

Do we need sparsity?

Here, we try to probe the effectiveness of sparsity mask. As shown in the figure below, without sparsity mask, the generated image could be too similar to the input sketch, especially when the sketch is dense.