Assignment #5 - Cat Photo Editing

Motivation

Edit the style of input image using pretrained model.

Part 1

AdaIN $x_i$ $y_{s,i}$ $y_{b,i}$ .

There are three choices of latent space:

z -- Gaussian Noise
$\it{MappingNetwork(z)}$
w+ -- use different w for different layers

We should initialize the latent first, then pass the latent through a pretrained Generator, use the optimization-based method to draw the image near the target while keeping the image style faithful to the pretrained model.

The choice of latent space, loss weights, optimizer and data sample leads to results in different quality.

LBFGS seems faster than the Adam optimizer. Since the sample only changes fast in the first several steps but remains roughly the same later in some cases.

w latent, loss weight: perc 0.1 | l1 0.01 | delta 0.1, LBFGS, step 1, 2, 3, 4, 5, 995.

w latent, loss weight: perc 0.1 | l1 0.01 | delta 0.1, Adam, step 1, 201, 401, 601, 801, 995.

The results from z/w/w+latent space can all be faithful to the target(content image) in some range, and realistic. In practice, it is harder to find a good set of loss weight in w+. Here shows the results using optimal parameters among all trys.

Reference

Left: stylegan, z | Right: vanillagan, z

Left: stylegan, w | Right: vanillagan, w+

Part 2

In part2, we are trying to add content and style to sketch image, while preserving the contour information indicated by the user given sketch.

The sketch image(RGBA) serves as the input, and a corresponding mask(0,1) is generated according to its alpha channel.

We still use two terms of loss here:

perceptual loss $z^* = argmin_z \sum_i ||f_i(G(z))-v_i||_1$
Lp $z^* = argmin_z ||M * G(z) - M * S||_1$

The results using latents from different spaces not variate so much under coarse sketch condition, but obvious different in dense sketch.

sketch | latent z | latent w | latent w+

In the code implementation, these two losses all support mask. But using mask in feature space(percetual loss) seems weired, because after the conv extraction, the feature belongs to sketch might not be staying at the same place. For denser stroke, this effect might be alleviated.

Part 3

$x_{t-1} \rightarrow x_{t}$ $x_{t} \rightarrow x_{t-1}$ , plus the cond and uncond guidance. I use the cfg rate=7.

$N$ $N$ less than 1000 will reserve the low level features of the sketch, which enables the content of sketch to be kept.

sketch1 | sketch2

sketch2, cfg=7, step=500 | sketch1, cfg=7, step=500 | sketch1, cfg=7, step=700 | sketch1, cfg=6, step=500

(My drawing of Grumpy Cat doesn't look grumpy emm)