16-726 Assignment 5 - GAN Photo Editing

Canbo Ye (canboy)

Overview

In this assignment, I will implement a few different techniques that could manipulate images on the manifold of natural images. First, we will invert a pre-trained generator to find a latent variable that closely reconstructs the given real image. In the second part of the assignment, we will take a hand-drawn sketch and generate an image that fits the sketch accordingly. In the end, we will also add a style loss as the texture constraint.


Part 1: Inverting the Generator

For the first part of the assignment, we will solve an optimization problem to reconstruct the image from a particular latent code. Some example outputs using different generative models, latent space or combinations of the losses are shown as follows.

VanillaGAN

Using z latent space:

Raw image (left); Reconstructed image without perception loss (center); Reconstructed image with perception loss (right)

We could see from the results that the reconstructed images look nice and similiar to the raw image. The one with perception loss does better in details like the eye area. However, both reconstructed images are a little bit blurry and it might because we could choose a better generator.

StyleGAN

Using z latent space:

Raw image (left); Reconstructed image without perception loss (center); Reconstructed image with perception loss (right)

Using w latent space:
Raw image (left); Reconstructed image without perception loss (center); Reconstructed image with perception loss (right)

Using w+ latent space:
Raw image (left); Reconstructed image without perception loss (center); Reconstructed image with perception loss (right)

We could see from the results above that all the reconstructed images look nice and similiar to the raw image. There are some minor differences when using different latent spaces and losses. For each latent space, the ones with perception loss does better in details like the jaw or eye area but the effect is very subtle. Among the three latent spaces, the w and w+ space have smoother colorization than the z space. Comparing to the former results from VanillaGAN, the results here have a much higher image quality.


Part 2: Interpolate your Cats

Now that we have a technique for inverting the cat images, we can do arithmetic with the latent vectors we have just found. One simple example is interpolating through images via a convex combination of their inverses.

Some interpolation results between grumpy cats

Since the reconstruction is better with perception loss, we will keep this loss in the following questions.

Source image on the left and target image on the right; Interpolation result from VanillaGAN with z latent space

Interpolation results from StyleGAN with z (left), w (center) and w+ (right) latent space
When comparing the results from VanillaGAN and StyleGAN, we could see that the latter one has better image quality, which is resulted from the performance of the generator. Among the three latent spaces from the StyleGAN's results, we could find that the one from z space has some strange (green) colorization during the interpolation, while the ones from w/w+ space could mostly follow the target image.
For the interpolation visualization, we could see that all the results will adjust the face details and direction in a pretty smooth manner, but change the outside color (from white to brown here) in a coarse and rapid way.


Part 3: Scribble to Image

Next, we would like to constrain our image in some way while having it look realistic. We will initially develop this method in general and then talk about color scribble constraints in particular. We will use w latent space and use L2 loss and perception loss with mask.

Some scribble to image results

Raw scribbles and their results


We could see that the some generated images are very realistic (1-1, 3-1, 3-2) while some others are more blurry or strange. We find that sparser sketches tend to perform better than the denser ones, which is because it poses less color constrains to the generated images. Some generated images are blur (1-2, 2-2, 2-3), which might need more iterations of optimization. One option here is to add regularization to the latent space so that it could stay close to the original distribution.

Bells & Whistles

I tried to add a style loss as an additional texture constraint.

Raw scribble (left); result without style loss (center); result with style loss (right)

We could see that the one with style loss does better in the texture details and overall colorization, making it more realistic.