template taken from HTML5 Webtemplates.co.uk

16726 Project 3

Zhipeng Bao (zbao)

Overview

This project aims to implement generative adversarial networks (GANs). During this assignment, we implemented DCGAN and CycleGAN to two datasets. We have also implemented PatchGAN.

How to run the code

1. bash part1.sh to reproduce the results on part 1;

2. bash part2.sh to reproduce the results on part 2;

3. bash part3.sh to reproduce the results on part 3.


Part 1: DCGAN

1.1 Data Augmentation

Following the work of SimCLR (Chen et.al. ICLR 2020), I implemented 4 kinds of data augmentation in total: (1) Random resized crop with ratio as 0.75~1.33; (2) Random horizonal flip; (3)Random grayscale transformation with p=0.2; (4)Random gaussian blur with p=0.5. Other augmentation methods are provided in that paper, but I think they are not proper to train a generation-oriented network so I only implemented these 4 augmentation methods.

See data_loader.py for details.

1.2 Discriminator

a. Padding = 1. Let input size to be 2x, the output size to be x. Then we have (2x-4+2p)/2 + 1 = x. Then we solved p = 1.

b. See model.py for the implementation of DCGAN discriminator.

1.3 Generator

See model.py for the implementation of DCGAN generator.

1.4 Training Loop

See vanilla_gan.py for the implementation of the training loop.

1.5 Experiments

Results

Training curve for DCGAN Discriminator with basic setting.
Training curve for DCGAN Generator with basic setting.
Training curve for DCGAN Discriminator with deluxe setting.
Training curve for DCGAN Generator with deluxe setting.

We can see, for the early stage of training, we have a powerful discriminator and a weak generator. So the loss for D is near to 0 and loss for G is close to 1 for both setting. Then, after several epochs of training, the generator started to learn and fool the discriminator so the loss of D increased and loss of G dropped.

Next we can see the improvement of introducing deluxe data augmentation. For the basic setting, at the end stage of the training, the model collapses since it is easy to train a good D and the generator returned to generate random noises. But when we introduce the deluxe augmentation, the training still works and the quality of the images can be gradually improved.

If the GAN managed to train, the D loss should be close to 0 at the start, then it increased to around 0.5 and falls a little. If we assume we have a perfect GAN, the loss of D should be around 0.5 at the end. The G loss should be around 0.5 at the start, then it increased to 1 (since we have a good D), then it dropped gradually. If we assume we trained a perfect GAN, it will dropped gradually to 0.5.

Sampled images at iteration 200 with basic setting.
Sampled images at iteration 5800 with basic setting.
Sampled images at iteration 200 with deluxe setting.
Sampled images at iteration 9000 with deluxe setting.

At the early stage of training, G can only generate some random noises. Then, with the adversarial training, it can generate visual realistic images. compared with the basic setting, when we introduce the deluxe data augmentation, we can reduce the chessbord noise and also avoid model collapse problem when we trained a long time. The GAN with basic setting collapses after 6000 iterations but the GAN with deluxe setting still works until the end.


Part 2: CycleGAN

2.1 Generator

See model.py for the implementation of CycleGAN generator.

2.2 Training Loop

See cycle_gan.py for the implementation of training loop.

2.3 Experiments

Results

a. Results after 600 iterations
Cat A->B Translation at 600 iterations without cycle-consistent loss.
Cat B->A Translation at 600 iterations without cycle-consistent loss.
Cat A->B Translation at 600 iterations with cycle-consistent loss.
Cat B->A Translation at 600 iterations with cycle-consistent loss.
b. Results after 10000 iterations
Cat A->B Translation at 10000 iterations without cycle-consistent loss.
Cat B->A Translation at 10000 iterations without cycle-consistent loss.
Cat A->B Translation at 10000 iterations with cycle-consistent loss.
Cat B->A Translation at 10000 iterations with cycle-consistent loss.
c. Observations

We can see, when adding the cycle-consistent loss, the identity of the object keeps better. That is, when we did not include the cycle-consistent loss, although the model can still work, the translated images looks differently from the source images. However, if we include the cycle-consistent loss, the translated image maintains the same identity. That is a benefit with the cycle-consistent loss.

However, the transfered images still have some unclear parts which may because of the blur and grayscale data augmentation.


Part 3: Bells & Whistles

a. Differentiable data augmentation

As mentioned above, I also implemented a gaussian blur and a grayscale-based augmentation method.

b. Experiments on the Pokemon Dataset

DCGAN
Fire Pokemon
Water Pokemon

DCGAN did not work well for this dataset. The reason may because (1) The size of the dataset is not as large as the cat dataset; (2) Both fire pokemon and water pokemon do not have some common features, i.e. the shape or the attributes are totally different even two pokemons are from the same category.

CycleGAN
Fire -> Water.
Water -> Fire.

CycleGAN works well for the fire-water translation task.

c. PatchGAN

I implemented the patch discriminator following Jun-Yan's paper and also the discussion on the piazza. See mdoel.py and patch_gan.py for details. Here are some results.

Sampled cat images with PatchGAN.

From my results, although I could not find a clear difference between the normal DCGAN, I found the patch discriminator indeed helps training. When I introduce the patch discriminator, the model started to converge after around 200 iterations. In comparison, the normal DCGAN started to converge after around 2000 iterations. The patch discriminator also avoided the model collapse problem. For the visual quality of the patchGAN, I think the reason may be related to the datasetsize and the resolution. If we have a larger dataset and use a higher resolution, the advantage of PatchGAN can be more obvious.