To compute the necessary padding for the convolutional layers in the DCGAN discriminator:
O = (X - F + 2B) / S + 1
where:
Substituting F = 4 and S = 2, and setting O = X / 2, we solve for B:
B = 1
Thus, to ensure proper downsampling, each convolutional layer should apply a padding value of 1.
During GAN training, the discriminator's total loss approaches 0, while the generator's total loss approaches 1. This indicates that the discriminator is unable to distinguish between real and generated images, and the generator produces images that are indistinguishable from real ones.
Discriminator Loss
Vanilla
Vanilla with diffaug
Generator Loss
Vanilla
Vanilla with diffaug
Vanilla [800 iters]
Vanilla [6400 iters]
Vanilla with diffaug [800 iters]
Vanilla with diffaug [6400 iters]
Sample 1
Sample 2
Sample 3
Sample 4
DCGAN is a type of GAN that consists of a generator and a discriminator engaged in a min-max optimization game. The generator learns to produce realistic samples, while the discriminator learns to distinguish between real and fake samples. In contrast, DDPM follows a probabilistic approach, where it gradually adds noise to an image during training and learns to reverse this process to generate high-quality samples.
From my experiments, I observe that DCGAN suffers from instability which is caused by the adversarial nature of GANs. On the other hand, DDPM is more stable, as it optimizes a well-defined likelihood function, ensuring consistent training, better sample quality and diversity.
In terms of computational requirements, DCGAN is more lightweight and can be trained relatively quickly. DDPM, however, is computationally expensive, requiring a large number of steps during both training and inference.
X-Y
Y-X
X-Y
Y-X
X-Y
Y-X
X-Y
Y-X
X-Y
Y-X
X-Y
Y-X
X-Y
Y-X
X-Y
Y-X
When CycleGAN is trained with cycle consistency loss, the generated images maintain structural integrity, meaning that when an image is translated to the target domain and back to the source domain, it closely resembles the original. This results in translations that look realistic, with well-preserved details, textures, and content alignment across domains.
In contrast, training without cycle consistency loss leads to visually inconsistent and unstable translations which includes severe distortions, loss of important details, and unnatural artifacts in the generated images
Model Version: Stable Diffusion v1-4 from CompVis
Steps: Set at 50 steps. This parameter controls how many iterations the model performs to progressively denoise and refine the generated image. While higher numbers typically improve image quality, they increase computation time. The default recommended number of inference steps for Stable Diffusion is around 50, providing a good balance between quality and computational efficiency.
Guidance Scale: Set at 7.5. The guidance scale (also known as CFG scale) determines how strictly the generated images follow the provided text prompt. A value around 7 to 7.5 is commonly used as it offers a good balance between prompt adherence and creative flexibility.
Image Resolution: Defaulted to 512x512 pixels, matching the resolution at which the model was fine-tunedSample 1
Sample 2
Sample 3
Sample 4
DCDiscriminator evaluates an image at once, making it effective at capturing global structures but often weak at enforcing local consistency. As a result, images generated using DCDiscriminator have smoother textures but lack fine details, leading to blurry and oversimplified outputs.
In contrast, PatchDiscriminator divides an image into smaller patches and evaluates them independently, enforcing high-frequency details more effectively. This localized supervision helps capture textures and sharp edges more precisely, making the generated images more realistic.
X-Y
Y-X
X-Y
Y-X
Here are CycleGAN results on 2 additional datasets
X: Monet Painting
Y: Photo
X-Y
Y-X
X-Y
Y-X
X: Summer
Y: Winter
X-Y
Y-X
X-Y
Y-X