**CMU 16-726 Learning-based Image Synthesis**
**Assignment #3**
*Title: "When Cats meet GANs"*
*Name: Soyong Shin (soyongs@andrew.cmu.edu)*
(##) Contents
* Part 1 DC-GAN
* Part 2 Cycle-GAN
* Part 3 Bells & Whistles
(##) Part 1 DC-GAN
For the first part, I have implemented Deep Convolutional Generative Adversarial Network (i.e., DC-GAN). The model architecture follows the figure given by an assignment instruction as below:
![figure [model_architecture]: Discriminator Architecture](report/Figure1.png)
![figure [model_architecture]: Generator Architecture](report/Figure2.png)
In this section, I will describe each part of DC-GAN and training algorithm, and discuss the results.
------------------------------------------------------------------------------------------------------------------------------------------------------------
**1.1 Discriminator**
DC-GAN discriminator, $\mathcal{D}$, takes a batch of images as input and outputs the probabilities (0 to 1) of real images.
In order to modify the size of intermediate feature maps as shown in Figure 1, a set of kernel size, stride length, and padding should be tuned.
Since kernel size $K$ and stride length $S$ are given as $K=4$, $S=2$ from the problem, I obtained padding $P$ from the equation shown below.
$$
W_{out} = \frac{W_{in} + 2 \cdot P - (K - 1) + 1}{S} + 1
$$
Note that $W_{in}$ and $W_{out}$ is input and output width of feature map, and dilation was assumed to be 1.
Here, I only consider the width of the feature map since we assume the shape of feature map as a square (i.e. width = height).
As the size of feature map reduces by the convolutional network, $W_{in}$ can be substituted as:
$$
W_{in} = r \cdot W_{out}
$$
where $r$ is a ratio between $W_{in}$ and $W_{out}$. Therefore, the padding size $P$ is:
$$
P = \frac{S \cdot W_{out} - r \cdot W_{out} - S + K}{2}
$$
This part is implemented as the function ***get_padding*** at ***models.py***.
The source code of discriminator class is as below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | class DCDiscriminator(nn.Module): def __init__(self, norm='batch'): super(DCDiscriminator, self).__init__() K = 4 # kernel_size S = 2 # stride self.conv1 = conv(3, 32, K, S, get_padding(32, K, S), norm=norm) self.conv2 = conv(32, 64, K, S, get_padding(16, K, S), norm=norm) self.conv3 = conv(64, 128, K, S, get_padding(8, K, S), norm=norm) self.conv4 = conv(128, 256, K, S, get_padding(4, K, S), norm=norm) self.conv5 = conv(256, 1, K, S, get_padding(1, K, S, factor=4), norm='none') def forward(self, x): out = F.relu(self.conv1(x)) out = F.relu(self.conv2(out)) out = F.relu(self.conv3(out)) out = F.relu(self.conv4(out)) out = self.conv5(out).squeeze() return out | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | class DCGenerator(nn.Module): def __init__(self, noise_size, norm): super(DCGenerator, self).__init__() K = 4 S = 2 self.deconv1 = deconv(100, 256, K, 1, padding=0, norm=norm) self.deconv2 = deconv(256, 128, K, S, padding=1, norm=norm) self.deconv3 = deconv(128, 64, K, S, padding=1, norm=norm) self.deconv4 = deconv(64, 32, K, S, padding=1, norm=norm) self.deconv5 = deconv(32, 3, K, S, padding=1, norm='none') def forward(self, z): out = F.relu(self.deconv1(z)) out = F.relu(self.deconv2(out)) out = F.relu(self.deconv3(out)) out = F.relu(self.deconv4(out)) out = F.tanh(self.deconv5(out)) return out | cs |
1 2 3 4 5 6 7 8 9 10 | load_size = int(1.1 * opts.image_size) osize = [load_size, load_size] transform_layers = [ transforms.Resize(osize, Image.BICUBIC), transforms.RandomCrop(opts.image_size), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ] transform = transforms.Compose(transform_layers) | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | class CycleGenerator(nn.Module): def __init__(self, norm='batch'): super(CycleGenerator, self).__init__() K = 4 S = 2 # 1. Define the encoder part of the generator (that extracts features from the input image) self.conv1 = conv(3, 32, K, S, get_padding(32, K, S), norm=norm) self.conv2 = conv(32, 64, K, S, get_padding(16, K, S), norm=norm) # 2. Define the transformation part of the generator self.resnet_block1 = ResnetBlock(64, norm=norm) self.resnet_block2 = ResnetBlock(64, norm=norm) self.resnet_block3 = ResnetBlock(64, norm=norm) # 3. Define the decoder part of the generator (that builds up the output image from features) self.deconv1 = deconv(64, 32, K, S, padding=1, norm=norm) self.deconv2 = deconv(32, 3, K, S, padding=1, norm='none') def forward(self, x): out = F.relu(self.conv1(x)) out = F.relu(self.conv2(out)) out = F.relu(self.resnet_block1(out)) out = F.relu(self.resnet_block2(out)) out = F.relu(self.resnet_block3(out)) out = F.relu(self.deconv1(out)) out = F.tanh(self.deconv2(out)) return out | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | class PatchDiscriminator(nn.Module): """Bells & Whistle 1""" def __init__(self, norm='instance'): super(PatchDiscriminator, self).__init__() K = 4 S = 2 self.conv1 = conv(3, 32, K, S, get_padding(32, K, S), norm=norm) self.conv2 = conv(32, 64, K, S, get_padding(16, K, S), norm=norm) self.conv3 = conv(64, 128, K, S, get_padding(8, K, S), norm=norm) self.conv4 = conv(128, 1, K, S, get_padding(4, K, S), norm='none') def forward(self, x): out = F.relu(self.conv1(x)) out = F.relu(self.conv2(out)) out = F.relu(self.conv3(out)) out = self.conv4(out) return out | cs |