When Cats Meet GANs

Implementation

Overview

We train a DCGAN as well as CycleGAN to generate realistic images of cats.

Part 1: Deep Convolution GAN


DCGAN is a GAN that uses a convolutional neural network as the discriminator, and a network composed of transposed convolutions as the generator.

DCGAN: Deluxe Data Augmentation

To prevent overfitting to a real dataset, I implement a "deluxe" data augmentation, a series of Transforms comprised of:

DCGAN: Discriminator

1. Find Padding P given K = 4, S = 2 Using the formula for calculating output size:
(N + 2p - f ) / s + 1 = 0.5N
(N + 2p - 4) / 2 = 0.5N - 1
(N + 2p - 4) = N - 2
2p - 4 = -2
2p = 2
p = 1
Thus, padding P = 1.

DCGAN: Generator

Since we already upsample by a factor of 2, we then use kernel_size=3, stride=1, padding=1 to preserve the image dimensions.

DCGAN: Training

Losses:

We see a relatively smooth decrease in loss for the discriminator without diffaug. With Diffaug enabled, the generator loss is smaller which means it is able to generate more realistic images (this is supported by the generated outputs, see explanation in next section), confusing the discriminator and causing D to perform worse. Ideally, the two networks (G, D) should stabilize and produce consistent results. However, this is difficult to achieve. If the GAN manages to train well, the discriminator should perform worse because it can no longer accurately distinguish between real/fake, meaning discriminator will get 50% accuracy while GAN loss is minimal.
Basic vs Deluxe:
With basic data augmentation, we only Resize and Normalize. With deluxe data augmentation, we Resize, RandomCrop, RandomHorizontalFlip, then Normalize. This helps the model to be robust to size and different poses/horizontal flipping of the object of interest. Overall, changing from Basic->Deluxe helps make the output resolution much nicer, and eliminates many of the pixel/line artifacts and forms features nicely. Changing from no diffaug to DiffAug makes the output color is much more realistic/close to the original Grumpy Cat color, and helps to form the high level feature in the correct positions. It also creates a wider variety of cat poses.
Results (Basic, No DiffAug)


Results (Basic, With DiffAug)


Results (Deluxe, No DiffAug)


Results (Deluxe, With DiffAug)

Deluxe + DiffAug: Progress during Training
With deluxe data preprocessing + differentiable augmentation, early on in training (around iter 200) the image is very blurry but the general spatial colors of a cat are apparent. As the iterations increase, the features (eyes, nose, etc) become sharper and move into place. We see that by iter 400, the cats already look quite realistic. Because we randomly modify lighting/contrast etc and mask out some parts of the image, we force the model to become robust/not overfit to training images. The results at iter ~6000 look realistic and high-quality.

Part 2: CycleGAN


CycleGAN enforces a cycle-consistency-loss, enabling transformation between two different domains X and Y.

CycleGAN: Deluxe Data Augmentation

(Same as deluxe data augmentation from DCGan)

CycleGAN: Generator

We use three Residual Blocks with instance norm, each followed by a ReLU activation. This ensures that the output won't stray too much from the input.

CycleGAN: PatchDiscriminator

We remove the last conv layer to create spatial 4x4 outputs, instead of 1x1 output from DCGAN.

CycleGAN: CycleGAN Experiments: Grumpify Cat


1. Without Cycle-Consistency Loss: 1000 iters


With Cycle-Consistency Loss: 1000 iters


At 1000 iterations, it seems that the version with Consistency Loss (particularly Grumpy Cat -> Russian Blue) is much more high-res and realistic. This is because enforcing X->Y->X and Y->X->Y cycle forces the model to learn to convert a generated X->Y image to pass through Y->X generator, helping both generators to achieve realistic outputs. This results in faster convergence with little training data.

2. Without Cycle-Consistency Loss: 10000 iters


With Cycle-Consistency Loss: 10000 iters


At 10,000 iterations, CycleConsistency loss seems to generate much higher-resolution images, with eyes positioned correctly. The version without CycleConsistency are grainy, unrealistic, and distorted (e.g. messed up eye placement).

With DC Discriminator: 10000 iters

Patch Discriminator only penalizes structure at the scale of local images; i.e. it classifies whether each NxN patch in an image is real or fake. Without the Patch Discriminator, it is difficult to preserve local realism. We notice that with DCDiscriminator, the generated Russian Blue cats' eyes are not fully formed (often they are a black spot). This may be because the spatial features are not being encoded properly during training, so we aren't able to learn higher-level features like eyes, nose, ears, etc...

CycleGAN: CycleGAN Experiments: Apples2Oranges


1. Without Cycle-Consistency Loss: 1000 iters


With Cycle-Consistency Loss: 1000 iters


At 1000 iterations, I notice that Cycle-Consistency preserves the original background with much more integrity. Without CC-loss, the background tends to be distorted/blended with the foreground image, and is very grainy.
2. Without Cycle-Consistency Loss: 10000 iters


With Cycle-Consistency Loss: 10000 iters


At 10,000 iterations, CycleConsistency loss seems to generate much smoother objects with fewer artifacts. Overall, the generated images also look much more similar to the original ones.

With DC Discriminator: 10000 iters

With DC discriminator, I notice strange contrasts/random colors in the image that cause certain parts of the image to look very unrealistic. This could be attributed to the fact that PatchDiscriminator learns to detect whether local spatial regions are real/fake, which helps the GAN generate realistic high-level features in the images.