GAN Photo Editing

Sudeep Dasari

Andrew ID: sdasari

Overview

In this project, we explore using the GAN architectures we learned in class for photo editing. While these methods are a little more compute intensive, they enable users to do awesome things - like naturally interpolate cat images and turn sketches into full photos - without much human effort.

Part 1: Inverting the Generator

In order to manipulate the image in interesting ways, the GAN - \(G\) - must be first inverted to find a latent code \(z\) where \(I = G(z)\). In other words, we must find the latent code which generates the input image.

This can be done by solving the following optimization problem: \( z^* = \textbf{argmin}_z || G(z) - I ||_p + \lambda_p || V(G(z)) - V(I) ||_p \) - where \(V\) are VGG19 features. I set \(p=2\) and only use features from conv_1 and conv_2 layers to calculate \(V\). I tune \(\lambda_p\) and experiment with different Vanilla/StyleGAN generators, to find a combination with best inversion ability.

As you can see in the table below, increasing \(lambda_p\) results in better generations since the perceptual loss helps the optimization focus on finer grain details that our eyes care about. In contrast, just using a \(l_p\) loss results in much blurrier images, that miss important details like the pink flower.

Furthermore, the best generator was StyleGAN when optimizing the w/w+ latents. This network had enough capacity to fully model the images, but the z latent proved more difficult to optimize. Note that w+ results in best generation (less blury hand etc.), but at the risk of greater overfitting since we added additional degrees of freedom into the optimization.

Original Image

Model / Latent

\(\lambda_p = 0\)

perc_wgt=0

Vanilla GAN / z

StyleGAN / z

StyleGAN / w

StyleGAN / w+

Part 2: Interpolate your Cats

We can now perform image interpolation as follows - take 2 cat images and find their latent via inversion, generate linear interpolation between these embeddings \(x_\lambda = (1 - \lambda)x_1 + \lambda x_2, \lambda \in [0,1]\), then animate a gif of \(\lambda\) varying from 0 to 1. Note that I use the StyleGAN w+ model (shown above) for this process. The results are as below:

Image 1	Image 2	Interpolation

As you can see, we can sucesfully interpolate between cat images! Notably, the intermediate images are valid cats that are "in between" the sampled images. This is very unique, since most non-GAN interpolation requires hacky image warping with strange artifacts in the middle. Thus, GANs provide a much more natural and effort-free way to interpolate images.

Part 3: Scribble to Image

We can convert scribbles to images using a constrained optimization procedure. Specifically, we minimize the \(l_2\) loss between pixels in the scribble, and pixels from the image \(G(z)\). All other pixels are unconstrained. This process can be efficiently implemented using a binary mask. We then optimize \( z \) to find an image that satisfies as many constraints as possible. To get the best results, I also added a proximal loss \( || z - z_0 ||_2 \) which encourages the optimization not to drift too far from the original sampled latent. This process makes the final generations look more real - particularly when the original sample is "good." Again, I used the w+ StyleGAN latent model to generate the results below:

Mask

Start Image: \(G(z_0)\)

Final Image: \(G(z)\)

Sketch 1

Sketch 2

Sketch 3

Sketch 4

Sketch 5

I find that the sparse-est and dense-est reconstructions can work, but only if the initialization is fortunate. Otherwise, combining bad initialization with an under/over-constrained optimization problem results in very un-cat-like results. This is why sketches #2/5 fail. In contrast, I believe I got lucky with the initialization for sketch #1, which is why it has good results.

On the other hand, dense sketches that aren't fully filled in result in reliable generation performance. In particular, sketches #3/4 are already somewhat cat-like, and as a result they often generate good final results from sketches.

Website template graciously stolen from here