GAN Photo Editing

Implementation

Overview

This repo contains code which manipulates images on the manifold of natural images. First, we will invert a pre-trained generator to find a latent variable that closely reconstructs the given real image. In the second part, we will take a hand-drawn sketch and generate an image that fits the sketch accordingly

Part 1: Inverting the Generator


We first solve an optimization problem to reconstruct the image from a particular latent code. Natural images lie on a low-dimensional manifold; we consider the output manifold of a trained generator as close to the natural image manifold. Thus, we set up a nonconvex optimization problem.

Parameters

LP weight: 10
Perceptual loss weight: 0.01
L2 delta weight: 0.05

Comparing Losses


Comparing the losses, we see that perceptual and Lp loss on their own perform poorly, creating grainy/unrealistic images of cats with warped facial features. Perceptual + Lp loss yields a higher-resolution output, especially when combined with delta L2 norm (which makes the output closer to the original image).

Comparing GAN type, Latent space mode

Results

Overall, StyleGAN performs better than VanillaGAN due to the ability for Adaptive Instance Normalization (AdaIn) to align the mean/variance of content features with style features. In addition to pixel loss and perceptual loss, adding L2 norm of delta helps to constrain how far the optimized parameter strays from the original parameters, which makes the output images less grainy and more realistic-looking. Finally, mapping from z to the extended w+ latent space seems to yield the clearest images and most realistic reconstructions. Thus, the last column (StyleGAN with W+ Latent Space with LP+Perceptual+L2 loss) gives the best reconstruction output.

Runtime

It takes around 17 ms for each iteration on average using a Nvidia GeForce RTX 3090 GPU.

Part 2: Interpolating Cats


Implementation

For each pair of images we'd like to interpolate,solve for their latent code parameters z1 and z2, and feed a convex combination of them (with some weight alpha) through the generator to produce the output interpolated image.

Example 1:

Success case; the face of the cat interpolates naturally and smoothly between two images, and is relatively high resolution. The fur color also transitions nicely (brown to black).

Example 2:

Success case; the eyes, nose and mouth roughly stay in the same position throughout. Even the whiskers transition continuously.

Example 3:

Failure case; at the end of the GIF the face morphs unrealistically and the eyes are not aligned during the morph.

Part 3: Scribble to Image

Implementation

Given a user color sketch we would like GAN to fill in the details. We again optimize the constrained system below, such that the corresponding pixel in the generated image foreground must be equal to the sketch. We use the Hadamard (elementwise multiplication) product to make sure that generator output G(z) and the sketch S in the masked region.


We then optimize for the best latent code (using latent w+, same loss weights as previous section) that would generate an image similar to the sketch.
When using colors that are very close to the original image colors (#1, #2), the outputs are slightly more realistic and high-resolution. With a sparse sketch where the majority of the region is unfilled, there is more tolerance for the generator to "fill in" those missing regions; hence for example #7 has a red background at the bottom.

Failure case: density

It seems that if the sketch is too dense, the output might look very unrealistic because it is not able to find a good latent code. In the example below, I layered numerous different colors to the eyes, and made sure to fill the sketch in entirely. The generated output ends up looking grainy and unrealistic.