By: Maneesh Bilalpur
This assignment includes two parts: in the first part, we will implement a Deep Convolutional GAN (DCGAN). We will train the DCGAN to generate grumpy cats from samples of random noise. In the second part, we implement a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We will train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue).
The images below show the real images from grumpy cat and russian blue cats.
The DCGAN is trained using the Least Square GAN loss with no data augmentation(left) and with data augmentation(right) using horizontal flipping, random cropping from the larger input image. Both use 256 batch size. We observe the following synthesized image at 10k epochs with 256 batch size. We observe that data augmentation helps the GAN training process and see better quality images at fewer epochs.
Ideally we would want the generator to be able to fool the discriminator that the fake images are from the same distribution of real images. This means that the discriminator loss saturates at an intermediate value suggested a non-zero loss(between classifying real and fake images) typically in the range of 0.1-0.8.
We use the following formula determine padding from kernel_size = 4 and stride = 2.
With cGAN(without cyclic consistency loss) we attempt image-to-image translation between grumpy cats and russian blue cats with same data augmentation strategy mentioned previously.
With cycleGAN we attempt image-to-image translation between grumpy cats and russian blue cats with same data augmentation strategy mentioned previously.
We still experience trouble with training from grumpy cat to russian blue images. The loss curves were observed to similar to that of training over smaller images.
I tried patch discriminator with cyclic loss to enforce local consistency and reality. However I observe the outputs do not show any improvement. The output images were rather patch-lated(similar to pixelated). My debugging and hyperparameter tuning of lambda(0.01 to 1000 in multiples of 10) did not help. I implemented the PatchGAN network with the same configuration as the DC discriminator but with final 2 conv layers dropped and the loss is averged over the output feature space. In addition to this, I tried to use the discriminator presented in the paper by Isola et al. 2016 using leaky-relus and batchnorm instead of relu and instance norm as in DCGAN discriminator. I still experienced the same problem. An interesting I experienced with this was balancing between cyclic loss and LSGAN loss. Despite tuning my lambda values I have experienced that when I could optimise for cycle loss, the LSGAN loss has increased and vice-versa. I believe this is the nature of the problem statement that has made the optimisation tricky between multiple G-Ds and losses.
All experiments used lambda-10 except patchGAN where lambda=0.1 gave best results. All experiments are run for atleast 50000 iterations and best result are presented. Overview from the assignment website here.