Assignment #5 - GAN Photo Editing

Overview

In this assignment, we experiment with several image editing techniques with GANs. First, we invert a pertained generator for a vanilla GAN, as well as StyleGAN, to optimize the latent variable such that the generated image matches a target. Different combinations of loss functions (lp and perceptual) and latent spaces (z, w, and w+) are attempted to better optimize the latent space vector. Once we discover the combination that leads to the best results, we can smoothly interpolate two cats, and show the result as a gif. Finally, given basic sketches of cats, we can adapt the optimization problem to solve for the latent space vector that minimizes the difference between the generator output and the provided sketch.

Inverting The Generator

Let's take a look at some image reconstruction experiments. Here is the original image we wish to generate:

Using just l2 loss and the vanilla GAN, we see that the output is decent, but not great. This is expected since the perceptual loss component generally leads to much higher image synthesis results.

A small improvement is made when adding in the perceptual loss from the pretrained VGG network, especially, in the nose, mouth and eyes area. However, the result is still pixelated and low quality.

Pivoting from the vanilla GAN to StyleGAN (z latent space with perceptual loss), we see an enormous improvement in the quality of the generated image, which is expected since the StyleGAN architecture is better suited for synthesizing high-resolution images.

Even further improvements can be made when switching from the z latent space to the w latent space with StyleGAN:

Using the w+ latent space also led to a strong result, though the coloration seemed slightly better in the previous w space image:

Overall, it appeared that the combination of StyleGAN with l2 loss + perceptual loss (with a weight of 1, same as in the previous assignment), using the w latent space led to the best results. Each of the projections were optimized for 5000 iterations on the EC2 gd4n.xlarge instance's GPU (T4), (StyleGAN with l2 loss + perceptual loss w space takes 161 seconds).

Interpolating Cats

Let's now take a look at some interpolations between cats in the form of gifs. First, using the z latent space with StyleGAN.

Here is the first source image:

And here is the target image:

And the resulting interpolation:

Next, here is the same interpolation, but using the w latent space, and in slow motion.

And again the same interpolation, but using w+ at the moderate speed. I find this one to be effective as well, although I still believe the w space interpolation above proceeds smoother. However, that may be due to the increased number of frames from the previous example.

Scribble to Image

Let's now take a look at some examples of scribble to image, beginning with one of the provided sketches:

And below is the resulting image. As we will see, there is a delicate balance that must be struck between sparse and dense sketches in order to get good results. This sketch is certainly on the denser side, although it is quite good, which makes the task of generating an image within the provided sketch constraint not too difficult.

Now here is a sketch I drew on sketchio (admittedly I am no artist)

And the result below. The result was interesting here - since I only sketched the cats head and not its neck, ears, or anywhere else, the generator has no constraint on what surrounds the cat - so it encapsulated the cat in a costume!

Here is another sketch I drew on sketchio, this time focusing on more common colors and a sparser drawing.

As shown below, the colors of the cats iris, skin under the eyes, and mouth match extremely well with the colors of the drawing.