16726 - Learning Based Image Synthesis - Spring 2021

Tarang Shah (Andrew ID: tarangs)

Homework 3 - When Cats Meet GANs

DCGAN

Given that we use kernel size K = 4 and stride S = 2

The padding should be 1 to halve the image dimension

The formula used is,

$W_{out} = [\frac{W_{in}-k + 2p}{s}] + 1$

Where,

$W_{out}$ = Output Dimension $W_{in}$ = Input Dimension k = Kernel size s = Stride p = Padding [ ] = Integer value of

we substitute $W_{out} = W_{in}/2$ and the k=4 and s=2 to get p=1

Generator and Discriminator Losses - Basic Augmentation

Generator and Discriminator Losses - Deluxe Augmentation

The Generator Loss increases at first but it becomes constant and also starts decreasing slowly(this is clearly seen in the basic augmentation case). The Discriminator losses also decrease over time. The image samples below show that the model is getting better over time.

Image Samples

One of the samples from early in training (e.g., iteration 200)

Basic

Delux

One of the samples from later in training, iteration 13000

Basic

Deluxe

We can see that both the samples have improved over time. The basic images still have artifacts in the final image similar to the early(200th) iteration. The deluxe images on the other hand are quite smooth and are much better compared to the basic augmentations for the same number of training iterations.

CycleGAN

Iteration 600

Without cycle consistency(iteration 600)

With Cycle Consistency(iteration 600)

Iteration 10000

Without Cycle Consistency

With Cycle Consistency

We can see that after adding cycle consistency, the generated images have less distortions. For example in the 600 iteration images, with Cycle consistency, the eyes of the generated grumpy cat are much more well defined compared to the case without cycle consistency loss. Also in the 10000th iteration images, we can see that there are a few images which have distortions, such as the generated blue cats eyes are not properly aligned without cycle consistency, but with cycle consistency the features are more well defined. Also, it seems that the results would improve further with more data and more training iterations.

Bells and Whistles

I did the following bells and whistle tasks

Pokemon Generation

Spectral Normalization

Do something cool with your model - I generated a GIF/Video for the Pokemon results to show the evolution as the model trains.

We can see the results for these as follows,

Using The Pokemon Dataset

DCGAN

Using the Water Pokemon data for the Vanilla GAN, and training for ~2000 epochs, we obtain the following results. This is a video with a snapshot taken every 200 iterations. We see that as the training progresses, we do see some sort of a blurred version of the pokemon. Though the finer features of the pokemon like eyes etc are not clearly visible, the basic structure/outline looks like a pokemon.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ef173670-34ea-4e82-a24c-44cb8558330d/res.mp4

CycleGAN

I trained a CycleGAN the Pokemon dataset to convert from Water Type to Flying Type. I used a larger batchsize(64) and trained it for around 20000 iterations.

The results can be seen here,

Flying to Water

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1377ab07-3b74-4b9f-a0bb-e598e621e26b/FlyingToWater.mp4

We can see that most Pokemon are bieng converted quite well.

One interesting aspect is that for Pokemon that are Water+Flying (For example Pelipper and Wingull) and are present in both datasets, we can see that the network preserves the original Pokemon.

Water to Flying

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/127c91fe-3a5b-4f6b-833c-bbdc3da04757/WaterToFlying(1).mp4

Similarly we can also see that the conversion is also pretty good in the other direction. This is with just a few hundred images for both the X and Y datasets. Based on these results, it does seem that this network will be able to learn and generate much better images for a larger dataset.

Spectral Normalization

I used the Pytorch implementation of spectral norm. Based on the paper, I added Spectral Normalization to the Discriminator and ran a few experiments with the Cat CycleGAN.

Generator Results after 10000 epochs

X to Y

Y to X

We can see that the results with spectral norm are qualitatively better than the default results and also have less artifacts.