When Cats Meet GANs

11-747 Learning-based Image Synthesis Manuel Rodriguez Ladron de Guevara


process Latent space exploration, DCGAN, 256px

Overview

This assignment explores GANs on grumpy cats (and Pokemons). Firstly, we learn to generate grumpy cats from noise implementing a DCGAN model with L2 loss (LSGAN). In the second part, we implement CycleGAN to learn to map between two unpaired sets of images, in our cats context, sets of grumpy and russian blue cats. In addition, explore different datasets of different size and implement different tricks that improve the quality of our results.

Grumpy cat generation with DCGAN 256 pixels
Training losses DCGAN with Spectral Norm Training losses DCGAN w/o Spectral Norm

DCGAN with L2 Loss

Deep Convolutional GAN (DCGAN) was introduced by Radford et al. and uses convolutional neural network in the discriminator, and transposed convolutions in the generator. Specifically, we use the following architecture: process DCGAN Generator process DCGAN Discriminator

We train our DCGAN architecture using L2 loss, which helps stabilize the training with respect to the original GAN loss. We implement the training following the pseudocode below:

process GAN Training Loop Pseudocode

We run the training using basic transformation, ie, resizing and normalizing, and deluxe transformation, which includes resizing, random crop, random horizontal flip and normalization. We contrast these baseline results with some additions such as spectral normalization, patch discriminator and differentiable data augmentation.

DCGAN Results

The next to images show the DCGAN trained with the bare minimum of parameters. That is, trained for just 500 epochs, batch size 16, learning rate 0.0003, and basic and deluxe data augmentations.

Baseline 1: Basic Augmentation

Baseline basic augmentation, 1000 iterations
Baseline basic augmentation, 6400 iterations
process Training loss after 6400 iterations

Baseline 2: Deluxe Augmentation

Baseline deluxe augmentation, 1000 iterations
Baseline deluxe augmentation, 6400 iterations
process Training loss after 6400 iterations

Basic augmentation shows at this early stage better results than deluxe augmentation, however, the losses between discriminator and generator seem to diverge in the basic augmentation, while for the deluxe augmentation, they seem to get closer to each other. To ensure having some learning signal for G and D during training, the Generator loss should keep its values in between 0.5 and 2 (sometimes higher than 2, depending on the loss function). The D loss should not reach 0, otherwise there is no feedback for the generator to improve.

B&W: DCGAN with Pokemons dataset

DCGAN Pokemon, it 102600
DCGAN Pokemon, it 78800

The Pokemon dataset is way more difficult to train than the cats, and this is due to the diversity of the shapes and colors of pokemons. I achieved decent results after a lot of exploration and training runs. Tricks implemented:

process Interpolations

After a lot of effort, I could control the stabilization of the training and the quality of the outputs with the following hyperparameters: batch size 64, generator learning rate 0.0001, discriminator learning rate 0.0004, one-sided label smoothing uniformly sampled between 0.8 and 0.9, spectral normalization in convolutional layers, deluxe augmentation and 2-staged differentiable augmentation, the first 60000 iterations I only used cutout, and the second 80000 iterations I used cutout and translation.
DCGAN Pokemon, second half of training
Pokemon evolution! Pokemon evolution!

process Interpolations

process Samples

DCGAN Pokemon, it 41800
DCGAN Pokemon, it 89000

B&W: DCGAN with Cats HR (256x256px)

Once I got to work the pokemon dataset, I added a couple of layers to the generator and discriminator for the new size, cats at 256x256. I maintained the same parameters as for the pokemon dataset. One big difference I noticed when augmenting the resolution is the loss fluctuation. Here is where Spectral Normalization comes to rescue, there is quite a difference between using it and not using it!

Without Spectral Norm

process Generated Grumpy Cats at 256 pixels, iteration 39600

process Training loss without Spectral Normalization

With Spectral Norm

process Generated Grumpy Cats at 256 pixels, iteration 39600

process Training loss with Spectral Normalization


Cycle GAN

Cycle GAN was introduced by Zhu et al. as a more flexible option to image-to-image translation models, based on the famous pix2pix work by Isola et al. While pix2pix needs paired inputs, that is, each image has to be paired with the corresponding translation for the model to generalize to unseen images, CycleGAN does not need labelled data (paired images). It achieves such translation between 2 sets using a cycle consistency loss, which encourages that the translation from A to B to A to be the as close as possible to the original input A. The generator instead of using as input some noise Z, uses images from one of the sets, encodes then through a series of convolutional layers and residual blocks, and decodes using standard tranposed convolutions. The discriminator can be a regular DCGAN discriminator. However, we show how PatchGAN discriminator, initially introduced in the pix2pix model, generates much better results than the standard DCGAN discriminator.

process Cycle GAN Generator

We train our DCGAN architecture using L2 loss, which helps stabilize the training with respect to the original GAN loss. We implement the training following the pseudocode below:

process GAN Training Loop Pseudocode

We run the training using basic transformation, ie, resizing and normalizing, and deluxe transformation, which includes resizing, random crop, random horizontal flip and normalization. We contrast these baseline results with some additions such as spectral normalization, patch discriminator and differentiable data augmentation.

Baseline: Basic Augmentation, No Cycle-Consistency Loss

We show some results on both generators (X to Y and Y to X) using the cats dataset:

process Iteration 600 Blue to Grumpy process Iteration 10000 Blue to Grumpy process Iteration 600 Grumpy to Blue process Iteration 10000 Grumpy to Blue

We see how the Translation from Blue to Grumpy is better. This is due to the difference in size between the 2 datasets, while Grumpy has 205 images, Blue has only 76. This naturally translates to a difficult generation on the latter case.

Baseline 2: Basic Augmentation, Cycle-Consistency Loss

process Iteration 10000 Blue to Grumpy, using Cycle consistency loss process Iteration 10000 Grumpy to Blue, using Cycle consistency loss

The major improvement that we can see here is in the Grumpy to Blue cat. While generating Grumpy cats do not improve significantly, cycle-consistency loss greatly improves the generation of Blue cats. However, results are still far away from desired. To alleviate this, let's look at the results of doing data augmentation deluxe.

Baseline 3: Deluxe Augmentation, No Cycle-Consistency Loss

process Iteration 10000 Blue to Grumpy process Iteration 10000 Grumpy to Blue

We have similar results from baseline 2, grumpy cats do not improve greatly, but the data augmentation really pays off in the blue cat.

Baseline 4: Deluxe Augmentation, Cycle-Consistency Loss

process Iteration 10000 Blue to Grumpy, using Cycle consistency loss process Iteration 10000 Grumpy to Blue, using Cycle consistency loss

In this last baseline, we see how cycle consistency loss improve greatly the generation of the grumpy cats, being this baseline the best so far. On the other hand, sadly, Blue cats not only do not improve from Baseline 3, but they get worse.

B&W: Deluxe Augmentation, No Cycle-Consistency Loss, PatchGAN discriminator

process Iteration 10000 Blue to Grumpy process Iteration 10000 Grumpy to Blue

PatchGAN discriminator makes the difference! Finally we see good results for the grumpy cats and the best Blue cat generated so far. PatchGAN discriminator deleted some yellow artifacts we have seen previously around the eyes of the Bue cats.

B&W: Deluxe Augmentation, Cycle-Consistency Loss, PatchGAN discriminator

process Iteration 10000 Blue to Grumpy, using Cycle consistency loss process Iteration 10000 Grumpy to Blue, using Cycle consistency loss

The cycle loss again does not add much in the grumpy cats nor the blue cats. Arguably, grumpy cats have similar qualities and Blue cats are definitely worse.