16726 Project 3 - Zihang Lai

When Cats meet GANs

Project Overview

In this project, we reproduce two generative models (DCGAN and CycleGAN). Both models make use of generators to generate images and discriminators to drive the generated images to become more realistic. DCGAN aims to generate unconditional photorealistic images from a noise vector. In this project, we try to generate images of cats. CycleGAN is a conditional GAN model that converts images between two different domains. In this project, we convert between two kinds of cats (Grumpy and Russian Blue).

In [2]:
import matplotlib.pyplot as plt
import imageio
import numpy as np
import glob

Part 1: DCGAN

How to compute padding for the discriminator:

We know that the output dimension $O = (I - K + 2P) / S +1$, where

  • $O$ is output dimension
  • $I$ is input dimension
  • $K$ is kernel size
  • $P$ is padding size
  • $S$ is stride

Then we solve for $K=4$, $S=2$, $O=I/2$, and we can get $P=1$.

D and G loss curve

Below shows the loss curves of D and G for basic and deluxe data augmentation. Deluxe data augmentation yields lower D loss and higher G loss, suggesting a better balance between D and G. A good curve means the loss should be stable and not too huge or small. If the G has zero loss (that is, D classifies all fakes images as real), then training G would be meaningless (in fact, there will be no gradient for G). On the other hand, if D and G has huge loss, D classifies all images as fake, which also does not provide meaningful training signal for generator.

In [41]:
fig = plt.figure(figsize=(14,10))
plt.subplot(221); plt.imshow(plt.imread('figures/g_basic.png')); plt.title('g_basic')
plt.subplot(222); plt.imshow(plt.imread('figures/g_deluxe.png')); plt.title('g_deluxe')
plt.subplot(223); plt.imshow(plt.imread('figures/d_basic.png')); plt.title('d_basic')
plt.subplot(224); plt.imshow(plt.imread('figures/d_deluxe.png')); plt.title('d_deluxe')
for ax in fig.axes:
    ax.axis("off")

Quality of samples

Below shows two samples from 0.2k and 26k iterations. Clearly the late sample shows more realistic images. Although the late sample still show artefacts, it is clear that they are photos of grumpy cats. The early samples are mostly noise and all look the same.

In [45]:
fig = plt.figure(figsize=(14,10))
plt.subplot(121); plt.imshow(plt.imread('figures/sample-early.png')); plt.title('Early sample (iter.=0.2k)')
plt.subplot(122); plt.imshow(plt.imread('figures/sample-late.png')); plt.title('Late sample (iter.=26k)')
Out[45]:
Text(0.5, 1.0, 'Late sample (iter.=26k)')

Part 2: CycleGAN

Below shows some samples images from iteration 600 generated from CycleGAN (with and without cycle consistency loss).

In [66]:
fig = plt.figure(figsize=(14,10))
plt.subplot(121); plt.imshow(plt.imread('figures/naive/sample-000600-Y-X.png')); plt.title('Grumpy -> Russian Blue (iter.=600, no cycle consistency)')
plt.subplot(122); plt.imshow(plt.imread('figures/naive/sample-000600-X-Y.png')); plt.title('Grumpy <- Russian Blue (iter.=600, no cycle consistency)')
plt.show()
fig = plt.figure(figsize=(14,10))
plt.subplot(121); plt.imshow(plt.imread('figures/cycle/sample-000600-Y-X.png')); plt.title('Grumpy -> Russian Blue (iter.=600, with cycle consistency)')
plt.subplot(122); plt.imshow(plt.imread('figures/cycle/sample-000600-X-Y.png')); plt.title('Grumpy <- Russian Blue (iter.=600, with cycle consistency)')
Out[66]:
Text(0.5, 1.0, 'Grumpy <- Russian Blue (iter.=600, with cycle consistency)')

Below shows some samples images from iteration 10k generated from CycleGAN (without cycle consistency loss).

In [56]:
fig = plt.figure(figsize=(14,10))
plt.subplot(121); plt.imshow(plt.imread('figures/CycleGAN_Y-X.png')); plt.title('Grumpy -> Russian Blue (iter.=10k)')
plt.subplot(122); plt.imshow(plt.imread('figures/CycleGAN_X-Y.png')); plt.title('Grumpy <- Russian Blue (iter.=10k)')
Out[56]:
Text(0.5, 1.0, 'Grumpy <- Russian Blue (iter.=10k)')

Below shows some samples images from iteration 10k generated from CycleGAN (without cycle consistency loss).

It seems that there are no huge difference in generated images with or without cycle consistency loss. Visual inspection shows image generated with model trained with cycle consistency loss yields slightly better results. However, the coherent alignments between input and output can be achieved even without the cycle consistency loss. That is, the shape, orientation, etc. of the input cat is roughly aligned with the output. One conjectural reason for this phenomenon is that such alignment between input and output is the easier thing to do for the network, compared to generating a completely different cat.

In [55]:
fig = plt.figure(figsize=(14,10))
plt.subplot(121); plt.imshow(plt.imread('figures/CycleCons_Y-X.png')); plt.title('Grumpy -> Russian Blue (iter.=10k - w/ cycle cons.)')
plt.subplot(122); plt.imshow(plt.imread('figures/CycleCons_X-Y.png')); plt.title('Grumpy <- Russian Blue (iter.=10k - w/ cycle cons.)')
Out[55]:
Text(0.5, 1.0, 'Grumpy <- Russian Blue (iter.=10k - w/ cycle cons.)')

Part 3: Bells & Whistles

1. Generate a GIF video (visualizing training process)

The GIF below shows training CycleGAN from iteration 1000 to 20000. Clearly the image quality is improving with longer training schedule.

2. Another generative model (VAE)

The Variational Autoencoder (VAE) model is reproduced in ./vae.py. A VAE model autoencodes an image into a latent distribution, which aims to model the distribution from which the image data is genereated. More specifically, the encoder takes an image as input and compute a latent code, which is then used to compute a mean and a variance of the latent distribution. A vector is sampled from this distribution and used as input to the decoder. The decoder takes this vector as input, and generate an image. The distance between this generated image and the input image is used as loss for training. In order to train encoder, an extra reparameterization trick is used to allow differentiate through the sampling process.

In [62]:
fig = plt.figure(figsize=(10,6))
plt.subplot(211); plt.imshow(plt.imread('figures/vae_inputs.png')); plt.title('VAE Input')
plt.subplot(212); plt.imshow(plt.imread('figures/vae_prediction.png')); plt.title('VAE Prediction')
Out[62]:
Text(0.5, 1.0, 'VAE Prediction')

As shown above, the VAE model generates more blurry image compared to GAN. The reason could be the sampling process forces a continuity in the latent space. Such continuity means some latent vectors will lie in intermediatory areas between multiple possible latent distributions in the dataset (i.e. a point in the letent space could be sampled from multiple distributions), which causes the blurriness in output.