CMU 16-726: Learning Based Image Synthesis

When Cats meet GANs

By: Maneesh Bilalpur

Overview

This assignment includes two parts: in the first part, we will implement a Deep Convolutional GAN (DCGAN). We will train the DCGAN to generate grumpy cats from samples of random noise. In the second part, we implement a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We will train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue).

Sample real images

The images below show the real images from grumpy cat and russian blue cats.

Image captured by Sergei's camera used as the input

DCGAN

The DCGAN is trained using the Least Square GAN loss with no data augmentation(left) and with data augmentation(right) using horizontal flipping, random cropping from the larger input image. Both use 256 batch size. We observe the following synthesized image at 10k epochs with 256 batch size. We observe that data augmentation helps the GAN training process and see better quality images at fewer epochs.

Solution to the toy problem — Left: No data augmentation at 10k epochs. Right: DCGAN with data augmentation at 4400 epochs.

Training plots

Discriminator loss vs. epochs

Ideally we would want the generator to be able to fool the discriminator that the fake images are from the same distribution of real images. This means that the discriminator loss saturates at an intermediate value suggested a non-zero loss(between classifying real and fake images) typically in the range of 0.1-0.8.

Generator loss vs. epochs

Padding solution

We use the following formula determine padding from kernel_size = 4 and stride = 2.

CycleGAN

With cGAN(without cyclic consistency loss) we attempt image-to-image translation between grumpy cats and russian blue cats with same data augmentation strategy mentioned previously.

Generated images

Training plots

Discriminator loss vs. training iterations

Generator loss vs. training iterations

With cycleGAN we attempt image-to-image translation between grumpy cats and russian blue cats with same data augmentation strategy mentioned previously.

Generated images

Training plots

Discriminator loss vs. training iterations

Generator loss vs. training iterations

Bells and Whistles

CycleGAN with high resolution images

I trained the same cycleGAN configuration described above with high-res cat images. The results are largely similar.

We still experience trouble with training from grumpy cat to russian blue images. The loss curves were observed to similar to that of training over smaller images.

CycleGAN with PatchGAN discriminator

I tried patch discriminator with cyclic loss to enforce local consistency and reality. However I observe the outputs do not show any improvement. The output images were rather patch-lated(similar to pixelated). My debugging and hyperparameter tuning of lambda(0.01 to 1000 in multiples of 10) did not help. I implemented the PatchGAN network with the same configuration as the DC discriminator but with final 2 conv layers dropped and the loss is averged over the output feature space. In addition to this, I tried to use the discriminator presented in the paper by Isola et al. 2016 using leaky-relus and batchnorm instead of relu and instance norm as in DCGAN discriminator. I still experienced the same problem. An interesting I experienced with this was balancing between cyclic loss and LSGAN loss. Despite tuning my lambda values I have experienced that when I could optimise for cycle loss, the LSGAN loss has increased and vice-versa. I believe this is the nature of the problem statement that has made the optimisation tricky between multiple G-Ds and losses.

All experiments used lambda-10 except patchGAN where lambda=0.1 gave best results. All experiments are run for atleast 50000 iterations and best result are presented. Overview from the assignment website here.
Website template copied from here.