In this assignment, we implement a few different techniques that require you to manipulate images on the manifold of natural images. In the first part, we invert the GAN and compute the latent vector by optimization. In the second part, we interpolate two images by interpolating their corresponding latent vectors. In the last part, we convert sketch of cat into cat images.
In this part, we define the loss as \( L = |Im-G(z)|_{p_1} + \lambda \sum_i|V_i(Im) - V_i(G(z))|_{p_2}\), where the first term is the \( L_{p_1}=1 \) loss for the generated image and the target image, and the second term is the perceptual loss, which is the \( L_{p_1}=2 \) difference from the conv_1 output feature map of a pretrained VGG19 network of the generated image and the target image. We try different \( \lambda \) to experiment on the importance of the perceptual loss.
We try different generators with different latent space. Specifically, we tried a vanilla GAN model with z space, a StyleGAN model with w and w+ space. The results are shown below.
We can see from the results above that the the StyleGAN obviously generates better details, and the perceptual loss makes the network generate better overall appearance of the cat. If we look at the StyleGAN W space with perceptual loss, we can clearly see that the cat's left ear is generated better although it is not exactly the same as the originaly image, but this actually means that the perceptual loss makes the generated image more like a cat.
In this part, we first find the latent vector of two images by the inversed GAN, and interpolate the latent vector, and finally use the pretrained GAN to generate the images that ideally can be seemed properly as interpolation between the two original images. The results are shown below.
We can see from the results above that the interpolation works reasonably well for StyleGAN. The transformation from between the two cats in the left and the right images seems seamless. Also, there are actually some failed interpolation results from the vanilla GAN which I do not post here. One reason for that could be the latent vector output from the inversed method is not always stable.
For the last part of the image, we constrain the image by providing a sketch of the cat. The new optimization for our latent vector now changes to \( z^* = \arg \min_z ||M * G(z) - M * S||^2 \), where \( M \) is the mask and \( S \) is the sketch. This loss means that we are trying to minimize the differece of color between the generated image and the sketch for pixels in the masked area. Below are the results.
We can see from the results above that for most generated cat images, the quality is not great, the images are blurry and the cat does not always look like a cat. However, for some generated images, where the corresponding sketches only has a few strokes, the generaed images has better quality and look more like cats. This means that the amount of constraints highly affect the quality of the generated image. When the constraints are fewer, the image will looks better in general. If the constrained area is big, then only reasonable colors will provide better generated images.
In this part we do the experiments on the high-res grumpy cat images. Below are the results of projection and interpolation on higher resolution grumpy cat images.