We will talk about Deep Convolutional GANs (DCGAN) and CycleGANs.

Goal

  1. Generate cat pictures from a randomly sampled noise vector in the latent/feature space.
  2. Translate between images of two different types of cats (grumpy and Russian Blue)
  3. Translate between images of apples and images of oranges

1. Deep Convolutional GAN

DCGAN is just GAN but with convolutional layers.

Deep Convolutional Discriminator

dcgan discriminator architecture

The above image (taken from course website) describes the layers in the discriminator but we need to determine the sizes of the kernel, stride, and padding. We downsample the image by a factor of two using the following formula.

$$ (O \times S)+(K-S) = (I+2P) \\ O = I /2 $$

Where O is output size, I is input size, K is kernel size, S is stride, and P is padding.

Padding

We are given kernel size K=4, stride S=2. When S is equal to the downsampling factor, the actual values of O and I become irrelevant and we get P=1 for all layers except conv5, where we don’t need padding nor a stride if we use a kernel size K=4.

Deep Convolutional Generator Up-convolution

generator architecture

The first layer of the generator is given input of size 1x1 which needs to be quadrupled to 4x4. Through multiple experiments, I found that using a kernel size K=5, stride S=1, padding P=2 after scaling the image to the desired size made the training more stable.

Preprocess Augmentation and Differentiable Augmentation

I used all three DiffAug policies color,translation,cutout and in deluxe data augmentation, I used RandomCrop, RandomHorizontalFlip, RandomRotation transforms on top of the basic data augmentation transforms.

I applied DiffAug to both fake and real images any time the images needed to go through the discriminator. I also called .detach() method on fake images only when training the discriminator to fix the generator.

Visual Comparison

vanilla gan grumpy cat basic: vanilla gan grumpy cat basic vanilla gan grumpy cat basic diffaug: vanilla gan grumpy cat basic diffaug vanilla gan grumpy cat deluxe: vanilla gan grumpy cat deluxe vanilla gan grumpy cat deluxe diffaug: vanilla gan grumpy cat deluxe diffaug

The effects that I can observe from running 6400 iterations with these different settings are:

Loss graph comparison

The generator’s loss for cases where DiffAug is applied is consistently lower than when it is not. The loss when using deluxe augmentation is slightly lower but not significantly. This trend is exactly the mirror opposite for discriminator losses. This demonstrates that augmentation helps the network learn better.

Generator Loss

Loss graph

Discriminator Loss

Loss graph

Improvement over time

In the beginning, the generated images have “aliasing” effects. Therea are horizontal and vertical lines that are often noticeable in low quality JPEG- or PCA- compressed images. However, the overall structure already exists and is quite discernible. As we iterate more, the aliasing effect goes away and higher frequency detail is improved.

iteration 200: Alt text iteration 3200: Alt text iteration 6400: Alt text

2. CycleGAN

The domain of the CycleGAN Generator is different from that of vanilla GAN. Vanilla GAN samples from noise but CycleGAN translates an image to an image. Alt text

Again, the above image is taken from course website. The discriminator of CycleGAN is expands on that of vanilla GAN by using Patch Discriminator. This discriminator divides the output image into four patches and decides how realistic each patch is. This method forces detail to be consistent across different areas of the image, instead of, for example, having very realistic features in just one area.

Results: Grumpy ⇄ Russian Blue

Effect of cycle consistency loss

The effect of cycle consistency loss is not very pronounced for grumpy to russian blue, but the results of russian blue to grumpy show the effect a little bit. In the images generated without using cycle consistency loss, the russian blue cats are definitely unlike grumpy cats, but not necessarily the best representation of russian blue cats. In the images generated using cycle consistency loss, the russian blue cats are more like russian blue cats.

At iteration 1000 without cycle consistency loss: Alt text Alt text At iteration 1000 with cycle consistency loss: Alt text Alt text

Effect of DC vs Patch discriminator

The most notable difference in the effect of the two discriminators can be seen in the cats’ eyes. In the images generated with a DC discriminator, the overall structure matches the target cat, but detail in the eyes are not very expressive. On the other hand, using patch discriminator enhances local detail like the eyes.

With DC Discriminator: Alt text Alt text With Patch Discriminator: Alt text Alt text

Results: Apple ⇄ Orange

Effect of cycle consistency loss

This one is funny. Without cycle consistency loss, generated oranges are ooooooranggggeeeee but not orange. The generated apples scream apppppllplplllleeee but are not apples. With cycle consistency loss, oranges are more like oranges and apples are more like apples. The reason could be attributed to the fact that it’s hard to generate an apple from an ooooooranggggeeeee, so the network is regularized to not overshoot the optimization and instead just produce an orange.

At iteration 1000 without cycle consistency loss: Alt text Alt text At iteration 1000 with cycle consistency loss: Alt text Alt text

Effect of DC vs Patch discriminator

The difference here is hard to tell. The original images are not preprocessed enough (though the cat images were) for there to be meaningful difference in the local texture.

With DC Discriminator: Alt text Alt text

With Patch Discriminator: Alt text Alt text

Final results

Using cycle consistency loss, patch discriminator, 10,000 iterations

Alt text Alt text

Alt text Alt text