Padding

If the input size is n, stride is 2, and kernel size is 4, there will be ceil((n - 1) / 2) - 1 positions to place the kernel. This is because there are ceil(n/2) positions we could place the kernel, however the last one is always invalid. Additionally, if the input size is odd, the second to last is also invalid, so we subtract one in the ceil to account for this. Thus, in order to get output size of n/2 with even n, we can add 2 to n. This is the same as padding 1.

Padding

With differentiable augmentation:
Without differentiable augmentation:
With differentiable augmentation, augmentation can be applied to both real and fake images before being passed to the discriminator. This acts as a regularization for the discriminator, so it won't memorize the dataset. Without it, it is easier to memorize, so we see degraded quality of the output. In order to implement it, we just need to implement augmentation with differentiable functions.

Plots

Differentiable augmentation deluxe plots:
Differentiable augmentation basic plots:
No differentiable augmentation deluxe plots:
No differentiable augmentation basic plots:

Early generator results

Y to X 800 no cycle:
X to Y 800 no cycle:
Y to X 800:
X to Y 800:
In all honesty, both are terrible at this early stage, however you can see that the cycle does make some difference.
After 10k X to Y:
After 10k Y to X:
DC discriminator X to Y:
This looks almost better quality, however there is a lot less variance in the output of the image, probably because it's not patch based.

Orange Comparison

Ny cycle:
Cycle:
Mostly, we see that mostly the consistency loss can improve how the background is preserved, however this is not always the case.
DC disc: We see that the quality for DC disc. is actually quite similar.