In my project, I implemented techniques to manipulate images on the manifold of natural images.
First, I inverted a pre-trained generator to find a latent variable that closely reconstructs a given real image.
This involved solving a nonconvex optimization problem using a loss function and a trained generator,
where I tried out different losses and optimization methods to find the best solution.In the second part of the project,
I interpolated through images by combining latent images for two images and generating a new image.
Finally, I generated an image subject to constraints by solving a penalized nonconvex optimization problem,
where I used color scribble constraints to fill in details of a hand-drawn image. I reduced the equation under
the constraints to find the optimal solution using the Hadamard product, mask, and sketch.
I inverted a pre-trained generator to find a latent variable that closely reconstructs a given real image.
This was done by solving a nonconvex optimization problem using a loss function and a trained generator, and I tried out
different losses and optimization methods to find the best solution. My results showed that using a combination of perceptual
and L1 losses with the LBFGS optimization method gave me the most stable and efficient solution,
with a low reconstruction loss for the generated image.
Original Image | Vanilla GAN z | StyleGAN z | StyleGAN w | StyleGAN w+ |
I tried different combinations of VGG-19 layers for perceptual loss and the best result were generated
while using 'conv_1','conv_2','conv_5','conv_9' and 'conv_13' layers. After much experimenting I used
the default 0.01 weight for the perceptual loss and 10 for pixel level L1 loss. I compared the outputs of
vnailla GAN and styleGAN with different latent spaces (z, w and w+). The optimized result was reached at
the 6000th iteration and more optimization did not result in better image quality.
I interpolated through images by combining latent images for two images and generating a new image.
I experimented with different generative models and different latent spaces (latent code z, w space, and w+ space)
to see which worked best. My results showed that using the StyleGAN2 model and the w+ space gave me
the most visually pleasing results, with smooth transitions between images and realistic features.
Image A | Interpolations | Image B |
Above gif represents the interpolation between two latent spaces using StyleGAN and w+ space to embedded two images into the latent space.
The interpolated result smoothly translates the semantic details between Image A and Image B.
Used color scribble constraints to fill in details of a cat image, and reduced the equation under the constraints
to find the optimal solution using the Hadamard product, mask, and sketch. My results showed that the model was able to accurately
fill in the details of the image while maintaining it appearance, and the color scribble constraints helped to guide the generation process.
Sketchs | Reneration Images |