Padding

If the input size is n, stride is 2, and kernel size is 4, there will be ceil((n - 1) / 2) - 1 positions to place the kernel. This is because there are ceil(n/2) positions we could place the kernel, however the last one is always invalid. Additionally, if the input size is odd, the second to last is also invalid, so we subtract one in the ceil to account for this. Thus, in order to get output size of n/2 with even n, we can add 2 to n. This is the same as padding 1.

Padding

With differentiable augmentation:

Without differentiable augmentation:

With differentiable augmentation, augmentation can be applied to both real and fake images before being passed to the discriminator. This acts as a regularization for the discriminator, so it won't memorize the dataset. Without it, it is easier to memorize, so we see degraded quality of the output. In order to implement it, we just need to implement augmentation with differentiable functions.