This project aims to do image interpolation and generation from sketches by optimizing images in the latent space. We achieve this by inverting the generator to get latent vectors, sampling noise based on the latent space, interpolating between latents to generate an "in-between" image, and finally adding constraints to image optimization based on a given mask generated by a sketch.
I implemented the loss function based on a combination of perceptual loss, which is the content loss implemented in homework 4, L1 loss between image and target, and an L2 regularization loss set to the same weight as L1 loss.
Switching between vanilla and StyleGAN shows that the results look much better with StyleGAN. However, training on StyleGAN also takes longer due to its higher complexity.
Vanilla: StyleGAN:
Switching between z, w and w+ latent spaces show that w and w+ improve significantly over the z space, with w+ showing color closer to the original target, and is my preferred result.
z: w: w+:
Toggling loss shows that having all 3 losses enabled gives the best result. With only perceptual loss, the color seems to be off, and with L1 and perceptual loss only, the shape seems to be quite distorted.
Perceptual: Perceptual+L1: All losses:
Some results are shown here:
-> :
-> :
The quality between cats seems to be quite good and realistic, and the interpolation looks quite smooth due to the high number of steps between the two images.
I experimented with different kinds of sketches, and the results are shown below.
With a sparse sketch, the reconstruction of the face shape seems to go well. However, the eye color in the sketch does negatively affect the realisticness of the final result.
->
With a dense sketch, there isn't much room for the generator to infer detail. The final result looks exactly the original sketch in shape, with only details filled in by the generator.
->
With a sketch of a different cat color (cream cat in this case), the generator produces a good shape but still produces a grey cat face, where not much has been specified in the mask. This is presumably due to the training data being mostly black and white cats.
->
I discussed a high level concepts with Emma Liu and Joyce Zhang while working on this homework.