CMU 16-726: Learning Based Image Synthesis

When Cats meet GANs

By: Maneesh Bilalpur


This assignment includes two parts: in the first part, we will implement a Deep Convolutional GAN (DCGAN). We will train the DCGAN to generate grumpy cats from samples of random noise. In the second part, we implement a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We will train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue).

Sample real images

The images below show the real images from grumpy cat and russian blue cats.

Image captured by Sergei's camera used as the input Image captured by Sergei's camera used as the input


The DCGAN is trained using the Least Square GAN loss with no data augmentation(left) and with data augmentation(right) using horizontal flipping, random cropping from the larger input image. Both use 256 batch size. We observe the following synthesized image at 10k epochs with 256 batch size. We observe that data augmentation helps the GAN training process and see better quality images at fewer epochs.

Solution to the toy problem Solution to the toy problem
Left: No data augmentation at 10k epochs. Right: DCGAN with data augmentation at 4400 epochs.

Solution to the toy problem Solution to the toy problem Solution to the toy problem
Evolution of synthesized images at 200, 600 and 4400 iterations with data augmentation. We observe that intial images were as if we had a mode-collapse and look similar. As training proceeds the quality improved.

Training plots

Discriminator loss vs. epochs

Solution to the toy problem Solution to the toy problem
Left: No data augmentation training for 10k epochs. Right: DCGAN with data augmentation training for 10k epochs, we notice that the loss jumps at about 4400 epochs after which the image quality decreases.

Ideally we would want the generator to be able to fool the discriminator that the fake images are from the same distribution of real images. This means that the discriminator loss saturates at an intermediate value suggested a non-zero loss(between classifying real and fake images) typically in the range of 0.1-0.8.

Generator loss vs. epochs

Solution to the toy problem Solution to the toy problem
Generator training loss corresponding to the discriminator loss curves.

Padding solution

We use the following formula determine padding from kernel_size = 4 and stride = 2.


With cGAN(without cyclic consistency loss) we attempt image-to-image translation between grumpy cats and russian blue cats with same data augmentation strategy mentioned previously.

Generated images

Solution to the toy problem Solution to the toy problem
conditional GAN without cyclic consistency loss between grumpy cat and russian blue cat.

Training plots

Discriminator loss vs. training iterations

Solution to the toy problem
X and Y are images from different domain(X=grumpy cats and Y=russian blue cats).

Generator loss vs. training iterations

Solution to the toy problem
Generator training loss corresponding to the above discriminator loss curves.

With cycleGAN we attempt image-to-image translation between grumpy cats and russian blue cats with same data augmentation strategy mentioned previously.

Generated images

Solution to the toy problem Solution to the toy problem
cycleGAN between grumpy cat and russian blue cat.

Training plots

Discriminator loss vs. training iterations

Solution to the toy problem
X and Y are images from different domain(X=grumpy cats and Y=russian blue cats).

Generator loss vs. training iterations

Solution to the toy problem
Generator training loss corresponding to the above discriminator loss curves.

Bells and Whistles

CycleGAN with high resolution images

I trained the same cycleGAN configuration described above with high-res cat images. The results are largely similar.
Solution to the toy problem Solution to the toy problem
cycleGAN between grumpy cat and russian blue cat.

We still experience trouble with training from grumpy cat to russian blue images. The loss curves were observed to similar to that of training over smaller images.

CycleGAN with PatchGAN discriminator

I tried patch discriminator with cyclic loss to enforce local consistency and reality. However I observe the outputs do not show any improvement. The output images were rather patch-lated(similar to pixelated). My debugging and hyperparameter tuning of lambda(0.01 to 1000 in multiples of 10) did not help. I implemented the PatchGAN network with the same configuration as the DC discriminator but with final 2 conv layers dropped and the loss is averged over the output feature space. In addition to this, I tried to use the discriminator presented in the paper by Isola et al. 2016 using leaky-relus and batchnorm instead of relu and instance norm as in DCGAN discriminator. I still experienced the same problem. An interesting I experienced with this was balancing between cyclic loss and LSGAN loss. Despite tuning my lambda values I have experienced that when I could optimise for cycle loss, the LSGAN loss has increased and vice-versa. I believe this is the nature of the problem statement that has made the optimisation tricky between multiple G-Ds and losses.

Solution to the toy problem Solution to the toy problem
patchGAN between grumpy cat and russian blue cat.
