16-726 Learning-Based Image Synthesis, 2021 Spring

Project 3: When Cats meet GANs

Teddy Zhang (wentaiz)

Overview

Deep learning based image synthesis has been increasingly popular over the past decade. In particular, Generative Adversial Networks (GANs) have shown impressive capability of learning and synthesizing new images based on some given priors. In this project, the major task is to implement two major GAN framework for image synthesis, DCGAN and CycleGAN. The implemented models are verified by generating new cat images.

Deep Convolutional GAN

In this part, we are trying to learn the distribution from a set of images of the grumpy cat so that a series of synthesized new cat images can be sampled from the well-trained generator.

The DCGAN implemented in this project consists of 2 parts: a DC discriminator and a DC generator. The structure of the given discriminator and generator are shown in the figure below:

Some details of my implementation are:

We trained the above model for 5000 epochs. Here are the training loss curves of this model with two different data augmentation methods (Basic vs Deluxe):

Fig.1 Loss curve for the discriminator. Green: Basic, Gray: Deluxe.
Fig.2 Loss curve for the generator. Green: Basic, Gray: Deluxe.

We can see from the plots above that the two curves in each plot basically follow the same trend. For the discriminator, the loss curve generally keeps decreasing with oscillations. While the loss for the generator rises within the first 400 epochs and then gradually decreases. The model with deluxe data augmentation yeilds a higher loss for the discriminator and lower loss for the discriminator.

Generated samples from both models after 5000 epochs are also shown in the figure below:

Fig.3a Basic
Fig.3b Deluxe

We can tell that the generated results are visisually more delicate when deluxed data augmentation is applied. To better understand the learning process, we also plot the generated samples during different epochs for the deluxe model:

Fig.4a 400 iterations
Fig.4b 800 iterations
Fig.4c 2000 iterations
Fig.4d 30000 iterations

From Fig4 a-d, we can conclude that the quality of the generated images keeps improving. Within the first 800 iterations, the networks learned the basic color distribution of the cat. Then the shape of the face and eyes were also captured when it came to 2000 iterations. Finally, it took a very long training process for the network to learn to complete the detailed organ textures.

CycleGAN

In this part, we are trying to learn a model to achieve the image-to-image translation. In the experiments, we will use two photo collections of grumpy cats and Russian blue cats.

The CycleGAN implemented in this project consists of 4 parts: a DC generator from X image to Y image, a DC generator from Y image to X image, a DC discriminator for X and a DC discriminator for Y. The discriminator is the same as the one aforementioned. The structure of both generators is shown in the figure below:

Some details of my implementation are:

We trained the above model w/ and w/o the consistency loss for 600 iterations. Here are the comparisons between the generated results of the two models:

Fig.5a w/o cycle loss, X to Y
Fig.5b w/ cycle loss, X to Y
Fig.5c w/o cycle loss, Y to X
Fig.5d w/ cycle loss, Y to X

Then, we keep training both models for 10000 iterations. The final resulting generated samples are shown below:

Fig.6a w/o cycle loss, X to Y
Fig.6b w/ cycle loss, X to Y
Fig.6c w/o cycle loss, Y to X
Fig.6d w/ cycle loss, Y to X

We can tell from both Fig 5 that the generated cats from X to Y align better in poses when the consistency loss is applied. The reason is that the consistency loss is forcing the one to one mapping.
In Fig 6, there is no significant difference between the generated results from X to Y between the models with or without cycle loss. However, the model with consistency loss generates much better results when it comes to Y to X image synthesis. A potential reason is that the variation in Russian blue cat collection is larger than the grumpy cats and harder to learn.


Bells & Whistles: DCGAN model on Pokemon collection

Here are the results when I use all the pokemons as training data.

Fig.7a
Fig.7b

Bells & Whistles: CycleGAN model on Pokemon collection

Here are the results when I tried to implement a image to image translation between between the fire type pokemon and water type.

Fig.7a
Fig.7b

Bells & Whistles: GIF videos and New Pokemon

DCGAN:

CycleGAN
New Pokemon:

DCGAN with water type

DCGAN with all types


Acknowledgement