The goal of the assignment is to implement some basic photo editing operations with the help of GANs. We make the help of style gan and its representational style vector to control style of an image.
The goal is to find the latent space vector for which the original image can be reconstructed from. We are basically trying to find the projection of our image in the latent space by the following optimization-
We used our content loss metric between two images at a certain individual layer. The l2 loss included the l2 distance between Lth-layer feature of input image and that of target image , alongside the l2 distance between the original images. The layer was chosen to be conv4 of vgg19 backbone.
The losses were added with appropriate weights found from an ablation study. The weight vector for the perceptual loss was found to 0.002 and l2 loss was 0.998 giving almost entire importance to the original image distances.
The results show projecting data to differnt latent space z,w,w+ and reprojecting-
Data | z space | w space | w+ space |
---|---|---|---|
The w+ space is more effective in capturing the image and, with mean i.e. averaging the vectors going to the subsequent AdaIn layers, the results are more impressive.
A linear interpolation in the latent space i.e. w+ was performed and the resulting latent vector was used to generate the image. The results were dependent a lot on the initial guess.
The results are shown in the latent space.
Source | Interpolation | Target |
---|---|---|
The scribble problem is treated like reconstruction. Given a scribble and a mask, we can search to find a latent vector to yield an image that looks like the scribble. The constraints for this problem are similar to reconstruction and this soft-constrained optimization problem can be written as:
Note that ∗ is the Hadamard product, M is the mask, and S is the sketch.
Some example reconstructions are shown below -
Input Sketch | Results |
---|---|