16-726 21spring Assignment #3

When Cats meet GANs

Author: Zhe Huang (zhehuang)


Introduction

In this assignment, we explored training of GAN (Generative Adversarial Network). Specifically, we trained two different types of GANs that run on two different tasks. One is for generating photorealistic images and another is for the style transfer. By doing this assignment, we systematically learn about how to use GAN to tackle real-world problems.

Part 1: Deep Convolutional GAN

1.1 Padding size

For calculating the padding size, we follow the equation: $$ D_{out} = \lfloor \frac{D_{in} + 2 * padding\_size - kernel\_size}{stride\_size} + 1 \rfloor, $$ where $D$ denotes the dimension (e.g. height or witdh) of the input and output tensors.

Thus, we want the output dimension to be the $1/2$ of the size of the input (for conv1 to conv4), with $kernel\_size = 4$ and $stride\_size = 2$. Hence, we have $padding\_size = 1$.

For conv5, we want the size of padding to be zero.

1.2 Experiments

In this section, all our DCGANs are trained with conv_dim=64. We then show the screenshots of discriminatorand generator training loss with both --data_aug=basic/deluxe. We show that the smoothed training losses of both are decreasing over the training process, which indicates the network is learning as expected.

For output samples from deluxe data augmentation DCGAN, we choose one sample every 200 iterations from 200 to 1200. The generated result is obviously improving over time. The noise it carries in its pixels is decreasing dramatically and the features related to a cat is becoming clearer and more dominant.

Part 2: CycleGAN

2.1 Shorter runs

In this section, we follow the assignment requirement to run CycleGAN for 600 iterations both with/without cycle consistency loss. We show samples under both settings at 400 and 600 iterations. Here are our results for not having cycle consistency loss.

Here are our results for having cycle consistency loss.

2.2 Longer runs

After checking that our shorter runs work, we now move on to train longer for 10K iterations. We then show the final training result at 10K iteration both with/without cycle consistency loss. First, here we show results that do not utilize cycle consistency loss.

Here are our final results with cycle consistency loss.

As can be seen easily from above, cycle consistency loss contributes a lot to the final image quality. Especially, Y->X style transfer improves drastically thanks to the cycle consistency loss. The result behind this is presumably because that cycle consistency loss enforces the generated images can be only within those two domains involved in the training, while not having cycle consistency loss may result in that the generator accidentally learns to transfer from one style to some random unseen style that is irrelevant to our task.