When Cats meet GANs

Shiva Peri - Assignment 3

VanillaGan Results

The VanillaGan consists of a DCDiscriminator and a DCGenerator (DC = Deep Convolutional). Here we compare the results of two different data augmentation transforms: basic and deluxe. The basic transform is simply a simple rescaling and normalization operation. The deluxe transform additionally has random crops, random horizontal flips, and color jitter operations.

Additionally, we also compare the results of applying Differentiable Augmentation. With both the basic and deluxe data augmentation, using DiffAug results in signifigcantly better results. As shown in the loss plots, the discriminator loss converges slower than without DiffAug and the generator loss converges faster than without DiffAug. The additional transforms in the deluxe data augmentation allow VanillaGan to train on much more diverse data. Hence, the results are much more robost. Notably the deluxe results after 1000 iterations are comparable to the results of the basic transform with DiffAug.

Padding calculation:
padding = (stride * (output_size - 1) + kernel_size - input_size) // 2
padding = (2 * (32 - 1) + 4 - 64) // 2 = 1

In my implementation, I used kernel_size = 5, padding = 2

Basic (iter 6000)
Basic + DiffAug (iter 6000)
Basic Discriminator Loss

orange = basic
blue = basic + diffaug

Basic Generator Loss

orange = basic
blue = basic + diffaug

Deluxe (iter 200)
Deluxe + DiffAug (iter 200)
Deluxe (iter 6000)
Deluxe + DiffAug (iter 6000)
Deluxe Discriminator Loss

red = deluxe
blue = deluxe + diffaug

Deluxe Generator Loss

red = deluxe
blue = deluxe + diffaug

CycleGan Results

The main difference between the VanillaGan and the CycleGan is that CycleGan involves training multiple generators and discriminators in tandem. Creating transformation networks between two datasets (X, Y) allows use to minimize a cyclical optimization problem. This allows CycleGan to train roboustly as well as gives use an approach for style transfer.

The first two rows below demonstrate the striking difference between including and not including cycle loss while training the generators. After 1000 iterations the network which considers cycle loss produces far fewer artifacts than the network which does not. After 10000 iterations the network with cycle loss continues to improve.

In the final two rows (on the Cats dataset) we see a comparison of PatchDiscriminator and DCDiscriminator. Using the PatchDiscriminator visually results in fewer artifacts than DCDiscriminator for both domains after 10000 iterations. We see similar results in the Apple2Orange Dataset as well.

Cats Dataset

X-Y, Patch, No-Cycle (iter 1000)
Y-X, Patch, No-Cycle (iter 1000)
X-Y, Patch (iter 1000)
Y-X, Patch (iter 1000)
X-Y, Patch (iter 10000)
Y-X, Patch (iter 10000)
X-Y, DC (iter 10000)
Y-X, DC (iter 10000)

Apple2Orange Dataset

X-Y, Patch (iter 10000)
Y-X, Patch (iter 10000)
X-Y, DC (iter 10000)
Y-X, DC (iter 10000)