Using just l2 loss and the vanilla GAN, we see that the output is decent, but not great. This is expected since the perceptual loss component generally leads to much higher image synthesis results.
A small improvement is made when adding in the perceptual loss from the pretrained VGG network, especially, in the nose, mouth and eyes area. However, the result is still pixelated and low quality.
Pivoting from the vanilla GAN to StyleGAN (z latent space with perceptual loss), we see an enormous improvement in the quality of the generated image, which is expected since the StyleGAN architecture is better suited for synthesizing high-resolution images.
Even further improvements can be made when switching from the z latent space to the w latent space with StyleGAN:
Using the w+ latent space also led to a strong result, though the coloration seemed slightly better in the previous w space image:
Overall, it appeared that the combination of StyleGAN with l2 loss + perceptual loss (with a weight of 1, same as in the previous assignment), using the w latent space led to the best results. Each of the projections were optimized for 5000 iterations on the EC2 gd4n.xlarge instance's GPU (T4), (StyleGAN with l2 loss + perceptual loss w space takes 161 seconds).
Here is the first source image:
And here is the target image:
And the resulting interpolation:
Next, here is the same interpolation, but using the w latent space, and in slow motion.
And again the same interpolation, but using w+ at the moderate speed. I find this one to be effective as well, although I still believe the w space interpolation above proceeds smoother. However, that may be due to the increased number of frames from the previous example.
Let's now take a look at some examples of scribble to image, beginning with one of the provided sketches:
And below is the resulting image. As we will see, there is a delicate balance that must be struck between sparse and dense sketches in order to get good results. This sketch is certainly on the denser side, although it is quite good, which makes the task of generating an image within the provided sketch constraint not too difficult.
Now here is a sketch I drew on sketchio (admittedly I am no artist)
And the result below. The result was interesting here - since I only sketched the cats head and not its neck, ears, or anywhere else, the generator has no constraint on what surrounds the cat - so it encapsulated the cat in a costume!
Here is another sketch I drew on sketchio, this time focusing on more common colors and a sparser drawing.
As shown below, the colors of the cats iris, skin under the eyes, and mouth match extremely well with the colors of the drawing.