Project 5

Jiaheng Hu (jiahengh)

Overview

In this assignment, we implemented a few different techniques that manipulate images on the manifold of natural images. First, we invert a pre-trained generator to find a latent variable that closely reconstructs the given real image. In the second part of the assignment, we take a hand-drawn sketch and generate an image that fits the sketch accordingly.

Part 1: Inverting the Generator

For the target image, we use this one for all of the different settings.

1.1 perceptual loss weight: 0 (left) versus 0.5(right)

Looks like the perceptual loss helped with proning out the artifacts a bit to make the image more realistic.

1.2 Vanilla (Left) versus StyleGAN (Right)

Result looks similar, although styleGAN looks a little better, probably due to better model capability.

1.3 latent variable: z (left) versus w(middle) versus w+(right)

I believe using w+ space gives the model more freedom to manipulate the output, and therefore the corresponding image ressembles the target the most among all images. In summary, I believe stylegan_w+_perceptual=0.5 gives us the best image, but its runtime is also the longest.

Part2: Interpolate your Cats

We followed the instruction and run our algorithm, and show two sets of results below:

I think the interpolation looks pretty good. Nothing I can really complain about.

Part3: Scribble to Image

First, I tested with my own drawing, and get the following result (data, mask, z, w+):

Since my own drawing is so bad, I decide to test with the scratch from the project page

It's actually quite interesting that the w+ space reconstruction completly failed to generate realistic cat. I think this shows us the tradeoff of operating in w+ space: indeed, we enjoy the benifit of having more control over the images, but in the same time we are sacrificing "realisticity". Also, since it actually worked well on my first drawing, I think this actually shows that larger scribbles is going to help in the case of using w+.

B&W

1. Run on image of higher resolution

We run our projection algorithm on 256*256 obama image using the weight provided in the stylegan paper, and obtained the following results:

Target:

Projection with w:

Projection with w+:

W projection looks realistic, but the facial expression doesn't resembles the original picture. W+ projection has (perhaps?) better details, but it is not realistic at all. I think this again demonstrated the trade-off between using w+ and w, and indicate the perceptual loss is not enough when we are trying to reconstruct facial expression.