Yiwen Zhao's Project Page
Edit the style of input image using pretrained model.
In the styleGAN paper, the style is sampled from the latent space and combined to the synthesis network through AdaIN
, where each feature map
There are three choices of latent space:
We should initialize the latent first, then pass the latent through a pretrained Generator
, use the optimization-based method to draw the image near the target while keeping the image style faithful to the pretrained model.
The choice of latent space, loss weights, optimizer and data sample leads to results in different quality.
LBFGS seems faster than the Adam optimizer. Since the sample only changes fast in the first several steps but remains roughly the same later in some cases.
w latent, loss weight: perc 0.1 | l1 0.01 | delta 0.1, LBFGS, step 1, 2, 3, 4, 5, 995.
w latent, loss weight: perc 0.1 | l1 0.01 | delta 0.1, Adam, step 1, 201, 401, 601, 801, 995.
The results from z
/w
/w+
latent space can all be faithful to the target(content image) in some range, and realistic. In practice, it is harder to find a good set of loss weight in w+
. Here shows the results using optimal parameters among all trys.
Reference
Left: stylegan, z | Right: vanillagan, z
Left: stylegan, w | Right: vanillagan, w+
In part2, we are trying to add content and style to sketch image, while preserving the contour information indicated by the user given sketch.
The sketch image(RGBA) serves as the input, and a corresponding mask(0,1) is generated according to its alpha channel.
We still use two terms of loss here:
The results using latents from different spaces not variate so much under coarse sketch condition, but obvious different in dense sketch.
sketch | latent z | latent w | latent w+
sketch | latent z | latent w | latent w+
In the code implementation, these two losses all support mask. But using mask in feature space(percetual loss) seems weired, because after the conv
extraction, the feature belongs to sketch might not be staying at the same place. For denser stroke, this effect might be alleviated.
In this part, we need to add noise to the encoded input sketch cond
and uncond
guidance. I use the cfg rate=7.
The sample step
sketch1 | sketch2
sketch2, cfg=7, step=500 | sketch1, cfg=7, step=500 | sketch1, cfg=7, step=700 | sketch1, cfg=6, step=500
(My drawing of Grumpy Cat doesn't look grumpy emm)