Overview
This project aims to implement generative adversarial networks (GANs). During this assignment, we implemented DCGAN and CycleGAN to two datasets. We have also implemented PatchGAN.How to run the code
1. bash part1.sh to reproduce the results on part 1;
2. bash part2.sh to reproduce the results on part 2;
3. bash part3.sh to reproduce the results on part 3.
Part 1: DCGAN
1.1 Data Augmentation
Following the work of SimCLR (Chen et.al. ICLR 2020), I implemented 4 kinds of data augmentation in total: (1) Random resized crop with ratio as 0.75~1.33; (2) Random horizonal flip; (3)Random grayscale transformation with p=0.2; (4)Random gaussian blur with p=0.5. Other augmentation methods are provided in that paper, but I think they are not proper to train a generation-oriented network so I only implemented these 4 augmentation methods.
See data_loader.py for details.
1.2 Discriminator
a. Padding = 1. Let input size to be 2x, the output size to be x. Then we have (2x-4+2p)/2 + 1 = x. Then we solved p = 1.
b. See model.py for the implementation of DCGAN discriminator.
1.3 Generator
See model.py for the implementation of DCGAN generator.
1.4 Training Loop
See vanilla_gan.py for the implementation of the training loop.
1.5 Experiments
Results
We can see, for the early stage of training, we have a powerful discriminator and a weak generator. So the loss for D is near to 0 and loss for G is close to 1 for both setting. Then, after several epochs of training, the generator started to learn and fool the discriminator so the loss of D increased and loss of G dropped.
Next we can see the improvement of introducing deluxe data augmentation. For the basic setting, at the end stage of the training, the model collapses since it is easy to train a good D and the generator returned to generate random noises. But when we introduce the deluxe augmentation, the training still works and the quality of the images can be gradually improved.
If the GAN managed to train, the D loss should be close to 0 at the start, then it increased to around 0.5 and falls a little. If we assume we have a perfect GAN, the loss of D should be around 0.5 at the end. The G loss should be around 0.5 at the start, then it increased to 1 (since we have a good D), then it dropped gradually. If we assume we trained a perfect GAN, it will dropped gradually to 0.5.
At the early stage of training, G can only generate some random noises. Then, with the adversarial training, it can generate visual realistic images. compared with the basic setting, when we introduce the deluxe data augmentation, we can reduce the chessbord noise and also avoid model collapse problem when we trained a long time. The GAN with basic setting collapses after 6000 iterations but the GAN with deluxe setting still works until the end.
Part 2: CycleGAN
2.1 Generator
See model.py for the implementation of CycleGAN generator.
2.2 Training Loop
See cycle_gan.py for the implementation of training loop.
2.3 Experiments
Results
a. Results after 600 iterations
b. Results after 10000 iterations
c. Observations
We can see, when adding the cycle-consistent loss, the identity of the object keeps better. That is, when we did not include the cycle-consistent loss, although the model can still work, the translated images looks differently from the source images. However, if we include the cycle-consistent loss, the translated image maintains the same identity. That is a benefit with the cycle-consistent loss.
However, the transfered images still have some unclear parts which may because of the blur and grayscale data augmentation.
Part 3: Bells & Whistles
a. Differentiable data augmentation
As mentioned above, I also implemented a gaussian blur and a grayscale-based augmentation method.
b. Experiments on the Pokemon Dataset
DCGAN
DCGAN did not work well for this dataset. The reason may because (1) The size of the dataset is not as large as the cat dataset; (2) Both fire pokemon and water pokemon do not have some common features, i.e. the shape or the attributes are totally different even two pokemons are from the same category.
CycleGAN
CycleGAN works well for the fire-water translation task.
c. PatchGAN
I implemented the patch discriminator following Jun-Yan's paper and also the discussion on the piazza. See mdoel.py and patch_gan.py for details. Here are some results.
From my results, although I could not find a clear difference between the normal DCGAN, I found the patch discriminator indeed helps training. When I introduce the patch discriminator, the model started to converge after around 200 iterations. In comparison, the normal DCGAN started to converge after around 2000 iterations. The patch discriminator also avoided the model collapse problem. For the visual quality of the patchGAN, I think the reason may be related to the datasetsize and the resolution. If we have a larger dataset and use a higher resolution, the advantage of PatchGAN can be more obvious.