16-726: Assignment #3 - When Cat meets GANS

Rawal Khirodkar

1. Implement Data Augmentation: Refer data_loader.py.

2. Implement the Discriminator of the DCGAN: Refer models.py. The padding is 1.

The formula is: output = (input - kernel + 2padding)/stride + 1. Substitute, input = x, output = x/2, kernel = 4, stride = 2

3. Generator: Refer models.py.

4. Training Loop: Refer vanilla_gan.py.

5. Experiment: The plots are using tensorboard smoothing equal to 1 for better visualizations. The raw unsmoothed plot is visible in light orange.

The generator's loss decreases with time for both basic and deluxe data augmentation. As a check if the GAN manages to train, the discriminator should not be able to quickly differentiate between fake and real i.e its loss should not converge to zero which can be seen from these plots. For the last few iterations, the discriminator's loss plateaus.

Discriminator Loss - Basic	Generator Loss - Basic
Discriminator Loss - Deluxe	Generator Loss - Deluxe

The visualization of generated samples from early in the training (iter=200) and late in the training (iter=1200). Initially the generated samples are noisy, however with GAN training, the generated samples show cat features like eyes, ears and nose along with similar textures as the real cat images. However, the generated samples at the end of training are still not realistic, fixes like training for more epochs than 100 or increasing the capacity of the generator can be useful. This is observed at iter=6400, we see high fidelity features appearing.

Fake - Deluxe (Iter: 200)	Fake - Deluxe (Iter: 1200)	Fake - Deluxe (Iter: 6400)

6. CycleGAN Generator: Refer models.py. Please note, I used only one ResNet block following the skeleton code, as this was not a fillup section in the code.

7. CycleGAN Training Loop: Refer cycle_gan.py.

8. CycleGAN Experiments:.

a) From scratch without consistency loss.

Without Consistency Loss, X -> Y (Iter: 400)	Without Consistency Loss, Y -> X (Iter: 400)

b) From scratch with consistency loss.

With Consistency Loss, X -> Y (Iter: 400)	With Consistency Loss, Y -> X (Iter: 400)

c) From scratch without consistency loss (longer).

Without Consistency Loss, X -> Y (Iter: 10000)	Without Consistency Loss, Y -> X (Iter: 10000)

d) From scratch with consistency loss (longer).

With Consistency Loss, X -> Y (Iter: 10000)	With Consistency Loss, Y -> X (Iter: 10000)

e) I let the model train for longer, around 40k iterations to get better results for Y -> X generation.

Without Consistency Loss, X -> Y (Iter: 40000)	Without Consistency Loss, Y -> X (Iter: 40000)
With Consistency Loss, X -> Y (Iter: 49000)	With Consistency Loss, Y -> X (Iter: 49000)

f) Comments on using Consistency Loss:Yes, I noticed a difference between using consistency loss and without using consistency loss. The consistency loss preserved the structure of the input image like pose, wiskers, ears and eyes - only changing the type of the cat. However, without using consistency loss, the generator can map domain X cat to domain Y cat, where Y is realistic but is structurally different from the input X. Please note, I used default loss weights of consistency loss for training.

9. Bells and Whistles (Pokemon) I trained the CycleGAN without consistency loss on Pokemon Fire <-> Grass.

Fire -> Grass (Iter: 50000)	Grass -> Fire (Iter: 50000)

Generator and Discriminator Loss for Pokemon.

10. Bells and Whistles (VAE) I implemented a VAE for the cat conversion. The architecture of encoder is similar to DCGenerator and the architecture of decoder is similar to the DCDiscriminator. The latent space is 128 dimensional, the loss of the vae = 10*reconstruction loss + latent loss. Please refer vae_model.py and vae.py in the code. Here is the results after training for 500 epochs, the results are better than DCGAN and also towards the end, we see various colors like blue and red appearing. However, the results are not sharp as compared to DCGAN.

VAE Samples (Iter: 200)	VAE Samples (Iter: 1000)
VAE Samples (Iter: 500)	VAE Samples (Iter: 13000)