Assignment #3 - When Cats meet GANs

Overview

In this assignment, we began to gain experience with deep learning and GANs. The first part consisted of implementing a Deep Convolutional GAN to generate grumpy cats given samples of random noise, and the second consisted of implementing the CycleGAN architecture to convert Grumpy cats to Russian blue cats and vice versa.

When implementing the DCGAN, we implemented data augmentation (as a regularization technique to prevent the GAN from overfitting on this small dataset), a discriminator and generator, and the training procedure. The training procedure consists of drawing random samples from the noise distribution, generating fake images from the noise, computing the discriminator loss, and then repeating the same for the generator loss. The result is the network learns to a representation of the Grumpy cat, and given different samples of noise, it generates different realistic images of the cat.

For CycleGAN, we use the same discriminator as before, but now we create two generators - instead of generating an image from noise, we encode the input into a latent space, and then decode those features into an output, where the output is in the domain of the other generator’s input. We then augment the generator loss function with the cycle consistency loss - given the input to one generator 1, generator 2 should be able to take generator 1’s output, and reconstruct the original input to generator 1.

DCGAN

Padding

Losses

Here are the discriminator and generator training losses with basic data augmentation:
And here is with the deluxe data augmentation:

If the GAN is training properly, we should expect to see the discriminator loss initially decrease while the generator still has not learned how to generate realistic images. As the generator loss decreases and it learns to create images, the discriminator is fooled, so its loss will start to increase. Then as the generator reaches an optimum, the discriminator begins to learn how to classify the generator's images, and while the discriminator loss goes down, the generator loss then begins to increase. This tandem behavior is a good indicator that the GAN is training.

Results

With the deluxe data augmentation setting, here is what the generated samples looks like at iteration 200:

And at iteration 10,000:

Early in training, we can see a very murky fingerprint of the grumpy cat - the generators have not yet learned the cat representation. Around iteration 3,000, the generated cats begin to come into view and look more realistic, but are still very low quality. And above, we can see that they are much clearer by iteration 10,000. To illustrate this point, below is the real images that are given at that iteration. Clearly, the generator has come a long way - the center of the images (the cat's face) is of a decent quality, and the edges of the images appear to be of a lower quality.

CycleGAN

Now that we have examined DCGAN, we can move on to analyzing the differences in using the CycleGAN architecture.

Early Samples

These are the CycleGAN outputs from both generators early in training (iteration 600) without cycle-consistency loss.

And here are the analogous samples from early in training (iteration 600) with cycle-consistency loss.

Later Samples

Now that we have seen the samples from early iterations, we can take a look at the generated samples near 10,000 iterations. Here is the result for no cycle-consistency loss:

And here is the result with cycle-consistency loss:

There is a difference between the results with and without the cycle consistency loss. For grumpy cat -> Russian blue, it is definitely harder to discern the difference because several of the outputs are blurry. But for Russian blue -> grumpy cat, you can see if you zoom in that the grumpy cats that are generated with cycle-consistency loss are much more realistic-looking. Consider the cat at the top right corner. Without cycle consistency loss, it is blurry and it does not appear to have pupils. With cycle consistency loss, the pupils are clear and it is less blurry. The same applies for many of the images. The reason for the difference is because the cycle consistency loss (driven by the lambda cycle parameter) constrains the generators to reproduce images translated to the new domain back into their original domain for each cycle. If the lambda cycle parameter is too large, the generator loss is dictated primarily by the cycle consistency term, and the least squares generator loss term has little effect, causing the generators to just copy the pixels over from the original images. Therefore, lambda needs to be tuned so that the cycle consistency loss is not too overpowering over the traditional least squares generator loss, which I found empirically to be a small value for lambda. Tuning that value then leads to the addition of cycle consistency loss performing better than no cycle consistency loss (in that it generates higher quality and more realistic images).