Overview
We train a DCGAN as well as CycleGAN to generate realistic
images of cats.
Part 1: Deep Convolution GAN
DCGAN is a GAN that uses a convolutional neural network
as the discriminator, and a network composed of transposed convolutions as the generator.
DCGAN: Deluxe Data Augmentation
To prevent overfitting to a real dataset, I implement a "deluxe" data
augmentation, a series of Transforms comprised of:
- Resize
- Random Crop
- Random Horizontal Flip
- Normalize(0.5, 0.5, 0.5)
DCGAN: Discriminator
1. Find Padding P given K = 4, S = 2
Using the formula for calculating output size:
(N + 2p - f ) / s + 1 = 0.5N
(N + 2p - 4) / 2 = 0.5N - 1
(N + 2p - 4) = N - 2
2p - 4 = -2
2p = 2
p = 1
Thus, padding P = 1.
DCGAN: Generator
Since we already upsample by a factor of 2, we then use kernel_size=3, stride=1,
padding=1 to preserve the image dimensions.
DCGAN: Training
Losses:

We see a relatively smooth decrease in loss for the discriminator without
diffaug.
With Diffaug enabled, the generator loss is smaller which means it is
able to generate more realistic images (this is supported by the
generated outputs, see explanation in next section), confusing the
discriminator and causing D to perform worse.
Ideally, the two networks (G, D) should stabilize and produce consistent results.
However, this is difficult to achieve. If the GAN manages to train well,
the discriminator should perform worse because it can no longer accurately
distinguish between real/fake, meaning discriminator will get 50% accuracy
while GAN loss is minimal.
Basic vs Deluxe:
With basic data augmentation, we only Resize and Normalize.
With deluxe data augmentation, we Resize, RandomCrop, RandomHorizontalFlip, then Normalize.
This helps the model to be robust to size and different poses/horizontal flipping of the object of interest.
Overall, changing from Basic->Deluxe helps make the output resolution much nicer,
and eliminates many of the pixel/line artifacts and forms features nicely.
Changing from no diffaug to DiffAug makes the output color is much more realistic/close to the
original Grumpy Cat color, and helps to form the high level feature in the correct positions.
It also creates a wider variety of cat poses.
Results (Basic, No DiffAug)
Results (Basic, With DiffAug)
Results (Deluxe, No DiffAug)
Results (Deluxe, With DiffAug)
Deluxe + DiffAug: Progress during Training
With deluxe data preprocessing + differentiable augmentation, early on in training
(around iter 200) the image is very blurry but the general spatial colors of a cat are apparent.
As the iterations increase, the features (eyes, nose, etc) become sharper and move into place.
We see that by iter 400, the cats already look quite realistic. Because we
randomly modify lighting/contrast etc and mask out some parts of the image, we force the model to become
robust/not overfit to training images. The results at iter ~6000 look realistic and high-quality.
Iter 200:
Iter 2000:
Iter 4000:
Iter 6000:
Part 2: CycleGAN
CycleGAN enforces a cycle-consistency-loss, enabling transformation between two different domains X and Y.
CycleGAN: Deluxe Data Augmentation
(Same as deluxe data augmentation from DCGan)
CycleGAN: Generator
We use three Residual Blocks with instance norm, each followed by a ReLU activation.
This ensures that the output won't stray too much from the input.
CycleGAN: PatchDiscriminator
We remove the last conv layer to create spatial 4x4 outputs, instead of 1x1
output from DCGAN.
CycleGAN: CycleGAN Experiments: Grumpify Cat
1. Without Cycle-Consistency Loss: 1000 iters
With Cycle-Consistency Loss: 1000 iters
At 1000 iterations, it seems that the version with Consistency Loss (particularly
Grumpy Cat -> Russian Blue) is much more high-res and realistic.
This is because enforcing X->Y->X and Y->X->Y cycle forces the model to learn to
convert a generated X->Y image to pass through Y->X generator, helping both generators
to achieve realistic outputs. This results in faster convergence with little training data.
2. Without Cycle-Consistency Loss: 10000 iters
With Cycle-Consistency Loss: 10000 iters
At 10,000 iterations, CycleConsistency loss seems to generate much
higher-resolution images, with eyes positioned correctly. The version without CycleConsistency
are grainy, unrealistic, and distorted (e.g. messed up eye placement).
With DC Discriminator: 10000 iters
Patch Discriminator only penalizes structure at the scale of local images; i.e. it classifies
whether each NxN patch in an image is real or fake.
Without the Patch Discriminator, it is difficult to preserve local realism.
We notice that with DCDiscriminator, the generated Russian Blue cats' eyes are not fully
formed (often they are a black spot). This may be
because the spatial features are not being encoded properly during training, so we aren't
able to learn higher-level features like eyes, nose, ears, etc...
CycleGAN: CycleGAN Experiments: Apples2Oranges
1. Without Cycle-Consistency Loss: 1000 iters
With Cycle-Consistency Loss: 1000 iters
At 1000 iterations, I notice that Cycle-Consistency preserves the original background
with much more integrity. Without CC-loss, the background tends to be distorted/blended
with the foreground image, and is very grainy.
2. Without Cycle-Consistency Loss: 10000 iters
With Cycle-Consistency Loss: 10000 iters
At 10,000 iterations, CycleConsistency loss seems to generate much smoother objects
with fewer artifacts. Overall, the generated images also look much more similar to the
original ones.
With DC Discriminator: 10000 iters
With DC discriminator, I notice strange contrasts/random colors in the image
that cause certain parts of the image to look very unrealistic.
This could be attributed to the fact that PatchDiscriminator learns to detect
whether local spatial regions are real/fake, which helps the GAN generate realistic
high-level features in the images.