Assignment #3: When Cats meet GANs

Tarasha Khurana (Andrew ID: tkhurana)

DCGAN

The formula used for deriving the value of padding for a kernel size (kk) of 4 and stride (ss) of 2 for reducing the output width by half for a convolutional layer is

w=wk+2ps+1w' = \frac{w - k + 2p}{s} + 1

Putting w=w/2w' = w/ 2, the value of padding p=1p=1. This is used for all the conv layers except the last one where the value of padding is computed to be 0.

For the transpose convolution layer, the formula used for deriving the value of padding for a kernel size (kk) of 4 and stride (ss) of 2 for increasing the output width by two is

w=(w1)s2p+kw' = (w - 1) * s - 2p + k

Putting w=2ww' = 2w, the value of padding p=1p=1. This is used for all the transpose conv layers except the first one where the value of padding is computed to be 0.

Training Loss

Dicriminator; Augmentation: deluxe
Generator; Augmentation: deluxe
Discriminator; Augmentation: basic
Generator; Augmentation: basic

The training loss curves for the discriminator and generator decrease over time with jumps as every epoch starts. These are in accordance with how the GAN training loss curves should look like. With the deluxe data augmentation, the convergence is better and the jumps are less high.

Image Samples

Iteration: 200
Iteration: 2600

The generated images from the fixed noise vectors, improve as the training progresses. Iteration 200 shows the noise vector at the very start of the training process and at iteration 2600 (200 epochs), the generator learns to generator the center pixels of the noise patch well. As the training progresses, the corner pixels are also learnt properly.

CycleGAN

Without cycle consistency loss:

sample-000600-X-Y.png
sample-000600-Y-X.png

With cycle consistency loss:

sample-000600-X-Y.png
sample-000600-Y-X.png
sample-010000-Y-X.png
sample-010000-Y-X.png

First off, both the models were run with deluxe data augmentation. The model with cycle consistency loss performs better as compared to the model without cycle consistency loss (compare outputs at 600 iterations). Generally, it looks like the pose of the cats is learnt better with the use of the cycle consistency loss. Specifically, the YtoX generator is not able to learn much in 600 iterations without the cycle consistency loss possibly because the number of samples for grumpifyAprocessed class are very less as compared to grumpifyBprocessed but the cycle consistency loss makes its training faster.

For the model trained with cycle consistency loss, the outputs at 600 iterations and the output at 10000 iterations are very different. Specifically, the YtoX generator starts showing positive results at iteration 10000. It looks like for the XtoY generator, the background/corner details are learnt towards the 10000 iterations whereas most of the cat face is learnt by 600 iterations itself.

Since the data is less for the YtoX generator, it takes time to train and learns cat face by 10000 iterations. Had it been trained for longer, it would successfully learn the background/corner pixels too.