InterClothesGAN
Final Project

CMU 16-726 Image Synthesis S22

Tomas Cabezon Pedroso

In this project, we will explore the possibilities of interpreting the latent semantics learned by GANs for garment editing. This work is inspired by the paper 'Interpreting the Latent Space of GANs for Semantic Face Editing' and its final objective is to find other domains in which to apply this study. The project is divided into two parts: the first part, consists on the trainning of a StyleGAN on the Viton dataset for its posterior analysis. In the second part, we will conduct a study of the diferent features encoded in the learned latent space to control of different garment features, such as, sleeve length or texture.

InterFaceGAN

Interpreting latent space

In the InterFaceGAN paper, the authors show how they found that:

"the latent code of well-trained generative models actually learns a disentangled representation after linear transformations. We explore the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes."

This project is based on the exploration of the possibilities of this latent space exploration in other domains, as it could be fashion, and garment tweaking. To do so, we have followed the claim of the previous paper:

"It has been widely observed that when linearly interpolating two latent codes z1 and z2, the appearance of the corresponding synthesis changes continuously.

According to Property 1, the linear interpolation between z1 and z2 forms a direction in Z, which further defines a hyperplane. We therefore make an assumption that for any binary semantic, there exists a hyperplane in the latent space serving as the separation boundary. Semantic remains the same when the latent code walks within the same side of the hyperplane yet turns into the opposite when across the boundary"

To find this boundary, a vector-support machine (SVM) is used. Points belonging to the extremes of each binary feature are used to find the hyperplane as it can be seen in the following image.

Diagram of an SVM finding the hyperplane that separates two different classes of data points.

StyleGAN

In this firt part, a StyleGAN is trained to later perform the latent space study. Nvidia's StyleGan is used to train this GAN on the Viton dataset, composed of is used 14.221 images of different top garments.

Training process of the styleGAN.

In the following images the losses of the training of our model can be seen. We kept track of the FID50k score during training, receiving a final score of 26.11. The scores given by the discriminator to the real images, as well as, the scores of fake images get closer to the absolute value of 0.5, showing that during training the model improves.

FID50K Scores of the StyleGAN training.

Loss/scores of real images.

Loss/scores of fake images.

Boundaries

Once we have a well trained GAN we procede to explore the disentaglement of the different features. We used the previously explained SVM to find the hyperplanes that separete the boundaries and get the normal vectors to this hyperplanes as the feature editing directions. To manipulate/tweak any attribute, we edit the original latent code z as:

$z_{edit}=z+\alpha n$
where α is the parameter that shifts z on the editing direction n, which the unit vector normal to the boundary hyperplane. In the following image is a diagram of how the sleeve length editing direction was found. Apart from this feature, we also found the direction to edit the texture of the garment as well as the redness of it.