11-747 Learning-based Image Synthesis Manuel Rodriguez Ladron de Guevara
This assignment explores GANs on grumpy cats (and Pokemons). Firstly, we learn to generate grumpy cats from noise
implementing a DCGAN model with L2 loss (LSGAN). In the second part, we implement CycleGAN to learn to map between
two unpaired sets of images, in our cats context, sets of grumpy and russian blue cats. In addition, explore
different datasets of different size and implement different tricks that improve the quality of our results.
Deep Convolutional GAN (DCGAN) was introduced by Radford et al.
and uses convolutional neural network in the discriminator, and transposed convolutions in the generator.
Specifically, we use the following architecture:
DCGAN Generator
DCGAN Discriminator
We train our DCGAN architecture using L2 loss, which helps stabilize the training with respect to the original GAN loss. We implement the training following the pseudocode below:
GAN Training Loop Pseudocode
We run the training using basic transformation, ie, resizing and normalizing, and deluxe transformation, which includes resizing, random crop, random horizontal flip and normalization. We contrast these baseline results with some additions such as spectral normalization, patch discriminator and differentiable data augmentation.
The next to images show the DCGAN trained with the bare minimum of parameters. That is, trained for just 500 epochs, batch size 16, learning rate 0.0003, and basic and deluxe data augmentations.
Basic augmentation shows at this early stage better results than deluxe augmentation, however, the losses between discriminator and generator seem to diverge in the basic augmentation, while for the deluxe augmentation, they seem to get closer to each other. To ensure having some learning signal for G and D during training, the Generator loss should keep its values in between 0.5 and 2 (sometimes higher than 2, depending on the loss function). The D loss should not reach 0, otherwise there is no feedback for the generator to improve.
The Pokemon dataset is way more difficult to train than the cats, and this is due to the diversity of the shapes and colors of pokemons. I achieved decent results after a lot of exploration and training runs. Tricks implemented:
Interpolations
Interpolations
Samples
Once I got to work the pokemon dataset, I added a couple of layers to the generator and discriminator for the new size, cats at 256x256. I maintained the same parameters as for the pokemon dataset. One big difference I noticed when augmenting the resolution is the loss fluctuation. Here is where Spectral Normalization comes to rescue, there is quite a difference between using it and not using it!
Generated Grumpy Cats at 256 pixels, iteration 39600
Generated Grumpy Cats at 256 pixels, iteration 39600
Cycle GAN was introduced by Zhu et al. as a more flexible option to image-to-image translation models, based on the famous pix2pix work by Isola et al. While pix2pix needs paired inputs, that is, each image has to be paired with the corresponding translation for the model to generalize to unseen images, CycleGAN does not need labelled data (paired images). It achieves such translation between 2 sets using a cycle consistency loss, which encourages that the translation from A to B to A to be the as close as possible to the original input A. The generator instead of using as input some noise Z, uses images from one of the sets, encodes then through a series of convolutional layers and residual blocks, and decodes using standard tranposed convolutions. The discriminator can be a regular DCGAN discriminator. However, we show how PatchGAN discriminator, initially introduced in the pix2pix model, generates much better results than the standard DCGAN discriminator.
Cycle GAN Generator
We train our DCGAN architecture using L2 loss, which helps stabilize the training with respect to the original GAN loss. We implement the training following the pseudocode below:
GAN Training Loop Pseudocode
We run the training using basic transformation, ie, resizing and normalizing, and deluxe transformation, which includes resizing, random crop, random horizontal flip and normalization. We contrast these baseline results with some additions such as spectral normalization, patch discriminator and differentiable data augmentation.
We show some results on both generators (X to Y and Y to X) using the cats dataset:
Iteration 600 Blue to Grumpy
Iteration 10000 Blue to Grumpy
Iteration 600 Grumpy to Blue
Iteration 10000 Grumpy to Blue
We see how the Translation from Blue to Grumpy is better. This is due to the difference in size between the 2 datasets, while Grumpy has 205 images, Blue has only 76. This naturally translates to a difficult generation on the latter case.
Iteration 10000 Blue to Grumpy, using Cycle consistency loss
Iteration 10000 Grumpy to Blue, using Cycle consistency loss
The major improvement that we can see here is in the Grumpy to Blue cat. While generating Grumpy cats do not improve significantly, cycle-consistency loss greatly improves the generation of Blue cats. However, results are still far away from desired. To alleviate this, let's look at the results of doing data augmentation deluxe.
Iteration 10000 Blue to Grumpy
Iteration 10000 Grumpy to Blue
We have similar results from baseline 2, grumpy cats do not improve greatly, but the data augmentation really pays off in the blue cat.
Iteration 10000 Blue to Grumpy, using Cycle consistency loss
Iteration 10000 Grumpy to Blue, using Cycle consistency loss
In this last baseline, we see how cycle consistency loss improve greatly the generation of the grumpy cats, being this baseline the best so far. On the other hand, sadly, Blue cats not only do not improve from Baseline 3, but they get worse.
Iteration 10000 Blue to Grumpy
Iteration 10000 Grumpy to Blue
PatchGAN discriminator makes the difference! Finally we see good results for the grumpy cats and the best Blue cat generated so far. PatchGAN discriminator deleted some yellow artifacts we have seen previously around the eyes of the Bue cats.
Iteration 10000 Blue to Grumpy, using Cycle consistency loss
Iteration 10000 Grumpy to Blue, using Cycle consistency loss
The cycle loss again does not add much in the grumpy cats nor the blue cats. Arguably, grumpy cats have similar qualities and Blue cats are definitely worse.