16-726 HW 3: When Cats Meet GANs

Jason Zhang (jasonyzhang@cmu.edu)

DCGAN

Calculating Filter Size

The output size () of a convolutional layer is

$V = frac{W - K + 2P}{S} + 1$

where is the input size, is the kernel size, is the padding, and is the stride. Padding can be computed using:

$P = frac{S(V-1) - W + K}{2}$

Thus, the padding should be 0 for the last layer and 1 for all other layers.

Loss Curves

Orange correspondes to basic data augmentation, blue corresponds to deluxe data augmentation.

When trained properly, the generator and discriminator should tradeoff losses and both converge to something reasonable. Using deluxe data augmentation, the discriminator has to work harder. As a result, the generator is able to converge to a lower loss. Using basic data augmentation, the generator is unable to fool the discriminator as easily and sees more spikes in training loss.

Results

Generated samples after 200 iterations.

Generated samples after 6400 interations.

Clearly, 200 iterations is not enough to produce something meaningful with the noise vector. After 500 epochs, the generator learns to produce images that do look like Grumpy. However, it only produces one mode (mode collapse) and does not show the same diversity as the training set, shown below.

Real images

CycleGAN

600 iterations, no cycle consistency loss:

600 iterations, with cycle consistency loss:

10k iterations, no cycle consistency loss:

10k iterations, with cycle consistency loss:

Overall, the generated results without cycle consistency look qualitatively better. However, there is no correspondence between the generated images and the real images. On the other hand, while adding the cycle consistency loss produced blurrier results, the face shapes have more correspondences. This is likely because adding the cycle consistency loss forces the generator to have some trade-off between fooling the descriminator (realism) and performing reconstruction (correspondences).

Bells and Whistles

I modified the discriminator to become a PatchGAN by simply dropping the last two layers. I was shocked at how much this improved the results. I believe this happened because the discriminator is focusing on patches in the image and because the discriminator has reduced capacity, making the generator's job easier.