16726 - Learning Based Image Synthesis - Spring 2021
Tarang Shah (Andrew ID: tarangs)
Homework 3 - When Cats Meet GANs
DCGAN
Given that we use kernel size K = 4 and stride S = 2
The padding should be 1 to halve the image dimension
The formula used is,
Where,
= Output Dimension = Input Dimension k = Kernel size s = Stride p = Padding [ ] = Integer value of
we substitute and the k=4 and s=2 to get p=1
Generator and Discriminator Losses - Basic Augmentation
Generator and Discriminator Losses - Deluxe Augmentation
The Generator Loss increases at first but it becomes constant and also starts decreasing slowly(this is clearly seen in the basic augmentation case). The Discriminator losses also decrease over time. The image samples below show that the model is getting better over time.
Image Samples
One of the samples from early in training (e.g., iteration 200)
Basic
Delux
One of the samples from later in training, iteration 13000
Basic
Deluxe
We can see that both the samples have improved over time. The basic images still have artifacts in the final image similar to the early(200th) iteration. The deluxe images on the other hand are quite smooth and are much better compared to the basic augmentations for the same number of training iterations.
CycleGAN
Iteration 600
Without cycle consistency(iteration 600)
With Cycle Consistency(iteration 600)
Iteration 10000
Without Cycle Consistency
With Cycle Consistency
We can see that after adding cycle consistency, the generated images have less distortions. For example in the 600 iteration images, with Cycle consistency, the eyes of the generated grumpy cat are much more well defined compared to the case without cycle consistency loss. Also in the 10000th iteration images, we can see that there are a few images which have distortions, such as the generated blue cats eyes are not properly aligned without cycle consistency, but with cycle consistency the features are more well defined. Also, it seems that the results would improve further with more data and more training iterations.
Bells and Whistles
I did the following bells and whistle tasks
- Pokemon Generation
- Spectral Normalization
- Do something cool with your model - I generated a GIF/Video for the Pokemon results to show the evolution as the model trains.
We can see the results for these as follows,
Using The Pokemon Dataset
DCGAN
Using the Water Pokemon data for the Vanilla GAN, and training for ~2000 epochs, we obtain the following results. This is a video with a snapshot taken every 200 iterations. We see that as the training progresses, we do see some sort of a blurred version of the pokemon. Though the finer features of the pokemon like eyes etc are not clearly visible, the basic structure/outline looks like a pokemon.
CycleGAN
I trained a CycleGAN the Pokemon dataset to convert from Water Type to Flying Type. I used a larger batchsize(64) and trained it for around 20000 iterations.
The results can be seen here,
Flying to Water
We can see that most Pokemon are bieng converted quite well.
One interesting aspect is that for Pokemon that are Water+Flying (For example Pelipper and Wingull) and are present in both datasets, we can see that the network preserves the original Pokemon.
Water to Flying
Similarly we can also see that the conversion is also pretty good in the other direction. This is with just a few hundred images for both the X and Y datasets. Based on these results, it does seem that this network will be able to learn and generate much better images for a larger dataset.
Spectral Normalization
I used the Pytorch implementation of spectral norm. Based on the paper, I added Spectral Normalization to the Discriminator and ran a few experiments with the Cat CycleGAN.
Generator Results after 10000 epochs
X to Y
Y to X
We can see that the results with spectral norm are qualitatively better than the default results and also have less artifacts.