16-726 Learning-Based Image Synthesis

Project 3: When Cats meet GANs

Chang Shi


Overview

This assignment is for hands-on experience coding and training GANs. This assignment includes two parts: in the first part, we will implement a specific type of GAN designed to process images, called a Deep Convolutional GAN (DCGAN). We will train the DCGAN to generate grumpy cats from samples of random noise. In the second part, we will implement a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We will train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue).

Part 1: Deep Convolutional GAN

$$\frac{input\_size+2P-K}{S} + 1 = output\_size$$ With kernel size K = 4 and stride S = 2, the padding should be P = 1 for the first 4 conv layer, and P = 0 for the last cov layer.

Loss





Discriminator training loss, data_aug=basic
Generator training loss, data_aug=basic
Discriminator training loss, data_aug=deluxe
Generator training loss, data_aug=deluxe

Briefly explain what the curves should look like if GAN manages to train. If the GAN manages to train, both D and G losses should reduce to a certain value and then oscillate around the “equilibrium” values. This is because in GAN, the generator and discriminator are competing against each other in a minimax game format, improving one will lead to a higher loss of the other. In the end, they will converge to a fixed value when the model is able to find an optimum, then both loss will oscillate round their “equilibrium” values.

Samples from early and later in training



Sample from early in training, iteration 200
Sample from later in training, iteration 1800
Sample from later in training, iteration 13000

Apparently, the samples from early in training are completely random noise, and gradually they starts to get close to the cat appearance in the training set. Basically during training, the samples improve by first showing the correct outlines and colors, then getting better on details like eyes and noses.

Part 2: CycleGAN

Samples from iteration 400





Without cycle-consistency loss, sample-000600-X-Y.png
Without cycle-consistency loss, sample-000600-Y-X.png
With cycle-consistency loss, sample-000600-X-Y.png
With cycle-consistency loss, sample-000600-Y-X.png

Samples from iteration 10000





Without cycle-consistency loss, sample-010000-X-Y.png
Without cycle-consistency loss, sample-010000-Y-X.png
With cycle-consistency loss, sample-010000-X-Y.png
With cycle-consistency loss, sample-010000-Y-X.png

In general, the results with cycle consistency loss are slightly better than those without cycle consistency loss. The observation coincides with intuition that both XtoY and YtoX generators should be trained towards the direction that their generated images can be successfully cycled back to the original domain and maintain validity when passing through the generator loop. Thus, adding cycle consistency loss would help generate more realistic images. (Though the improvement is more apparent in )

Bells & Whistles

DCGAN results on Pokemon dataset

It seems Pokemon has more details than cats, leading to a requirement of more training.


GAN results, Fire type of Pokemon