Overview
This repo contains code which manipulates images on the manifold
of natural images. First, we will invert a pre-trained generator
to find a latent variable that closely reconstructs the given
real image. In the second part, we will take a hand-drawn sketch
and generate an image that fits the sketch accordingly
Part 1: Inverting the Generator
We first solve an optimization problem to reconstruct the
image from a particular latent code.
Natural images lie on a low-dimensional manifold; we consider the
output manifold of a trained generator as close to the
natural image manifold.
Thus, we set up a nonconvex optimization problem.
Parameters
LP weight: 10
Perceptual loss weight: 0.01
L2 delta weight: 0.05
Comparing Losses
Comparing the losses, we see that perceptual and Lp loss
on their own perform poorly, creating grainy/unrealistic images of cats
with warped facial features.
Perceptual + Lp loss yields a higher-resolution output, especially
when combined with delta L2 norm (which makes the output closer to the original image).
Comparing GAN type, Latent space mode
Results
Overall, StyleGAN performs better than VanillaGAN due to the ability for
Adaptive Instance Normalization (AdaIn) to align the mean/variance of
content features with style features.
In addition to pixel loss and perceptual loss, adding L2 norm of delta helps
to constrain how far the optimized parameter strays from the original parameters,
which makes the output images less grainy and more realistic-looking.
Finally, mapping from z to the extended w+ latent space seems to yield the clearest
images and most realistic reconstructions.
Thus, the last column (StyleGAN with W+ Latent Space with LP+Perceptual+L2 loss) gives
the best reconstruction output.
Runtime
It takes around 17 ms for each iteration on average using a Nvidia GeForce RTX 3090 GPU.
Part 2: Interpolating Cats
Implementation
For each pair of images we'd like to interpolate,solve
for their latent code parameters z1 and z2, and feed
a convex combination of them (with some weight alpha)
through the generator to produce the output interpolated image.
Example 1:
Success case; the face of the cat interpolates
naturally and smoothly between two images, and is relatively
high resolution.
The fur color also transitions nicely (brown to black).
Example 2:
Success case; the eyes, nose and mouth roughly stay in the
same position throughout.
Even the whiskers transition continuously.
Example 3:
Failure case; at the end of the GIF the face morphs
unrealistically and the eyes are not aligned during
the morph.
Part 3: Scribble to Image
Implementation
Given a user color sketch we would like GAN to fill in the details.
We again optimize the constrained system below, such that the corresponding pixel
in the generated image foreground must be equal to the sketch.
We use the Hadamard (elementwise multiplication) product to make sure
that generator output G(z) and the sketch S in the masked region.
We then optimize for the best latent code (using latent w+, same loss weights as previous section)
that would generate an image similar to the sketch.
When using colors that are very close to the original image colors (#1, #2),
the outputs are slightly more realistic and high-resolution.
With a sparse sketch where the majority of the region is
unfilled, there is more tolerance for the generator to "fill in"
those missing regions; hence for example #7 has a red background at
the bottom.
Failure case: density
It seems that if the sketch is too dense, the output might look
very unrealistic because it is not able to find a good latent code.
In the example below, I layered numerous different colors to the eyes,
and made sure to fill the sketch in entirely. The generated output ends
up looking grainy and unrealistic.