When Cats meet GANs

Overview

In this project, we implemented two types of GANs. The first one is Deep Convolutional GAN that generates grumpy cat faces from random noise. The second one is CycleGAN that translates between Grumpy and Russian Blue.

Part1 DCGAN

Data Augmentation

In order to increase variety of data and reduce overfitting, the data augmentation randomly crops and flip the image.

Discriminator

Padding is calculated by equation $\frac{W}{2}=\frac{W-K+2P}{S}+1$. Plugging in the values we get the padding is 1. Same padding applies to generator's deconv layer.

Training

basic_loss This is the training loss for DCGAN without data augmentation. deluxe_loss This is the training loss for DCGAN with data augmentation. For a GAN to converge, the discriminator loss should not be too small, otherwise the generator will fail to generate any output that gets through the discriminator. Also the loss should smoothly decrease without oscillation, as it suggest that training is unstable and certain problems like mode collapse might have happened.

Result

Basic (Without data augmentation)

basic400 Output at 400 epoch basic1400 Output at 1400 epoch basic6400 Output at 6400 epoch

Deluxe (With data augmentation)

deluxe600 Output at 600 epoch deluxe1600 Output at 1400 epoch deluxe6400 Output at 6400 epoch
For both methods, the output only shows some color patches at early epochs (around 400). At around 1400 epochs, the output starts to show the outline of the cat's face. At 6400 epoch, the basic method still have some problems with the eyes of cat and white colored region, the deluxe method looks very promising with some minor defects.

Part2 CycleGAN

Training

no_consistXY no_consistYX Training result without cycle consistency in 600 epoch. consistXY consistYX Training result with cycle consistency in 600 epoch.
From the above results it is hard to see the benefit of cycle consistency. At 600 epoch it is not obvious as the outputs are very blurry. naive_cycle_loss cyvle_loss The comparison of loss between these two methods shows that the cycle consistency yields smaller loss. 10000-X-Y 10000-Y-X The result at 10000 epoch is much clearer and defected artifacts are few. Also, it is obvious that the generated face follows certain features of the original image, such as face shape and orientation.