proj2_wentaiz

16-726 Learning-Based Image Synthesis, 2021 Spring

Project 2: Gradient Domain Fusion

Teddy Zhang (wentaiz)

Overview

Gradient-domain processing is a classic technique in computational photography with varieties of applications including image blending, tone-mapping, and non-photorealistic rendering. In this project, the major task is to establish a seamless blending between two images with Possion Blending. Another variant that utilizes the mixed gradients is also implemented as a comparison.

The background image in Fig.3, Fig.4 and Fig.5 are all from my personal photo album!

Toy Problem - Restoring an image

In the toy problem, we are trying to restore an image from its gradient information. For the gradient calculation, we just take the forward difference for both axis.

The details of my implementation are:

Objectives
- $L_1 = ((v(x+1,y)-v(x,y))-(s(x+1,y)-s(x,y)))^2$
- $L_2 = ((v(x,y+1)-v(x,y))-(s(x,y+1)-s(x,y)))^2$
Conditions
- $v(0,0)=s(0,0)$ . It can be any pixel from the source image.
Solver: The lsqr solver from scipy package. It is a iterative solver for least square problems.
Efficiency: To enhance the speed of computation, an alternative version with sparse matrix representation is also created with csc_matrix from scipy. The comparison in terms of the runtime is shown in this table.

Here are the test results of this algorithm:

Fig.1 Left: Original image. Right: Restored image.

No. of iterations: 594

Residuals (1-Norm): 0.0067

We can tell that our implemetation successfully restored the image.

Possion Blending

Given a source image and a target image $T$ , we want to blend the images within a predefined region $S$ and solve for the new image $V$ . Here, we are calculating the gradients for each pixel in all 4 directions. The objectives are minimized for each channel separately. The obtained new images are stitched together for the final RGB output.

The details of my implementation are:

Objectives
- $L = \sum_{i\in S,j\in N_i\cap S}((v(i)-v(j))-(s(i)-s(j)))^2+\sum_{i\in S,j\in N_i\cap \neg S}((v(i)-t(j))-(s(i)-s(j)))^2$
Conditions
- $v(i)=t(i)$ , for $i\in\neg S$ . We just directly copy the values from the target image if it is outside of region $S$ .
Solver: The lsqr solver from scipy package. It is a iterative solver for least square problems.
Gradient Calculation: Forward difference for all 4 directions. $v(i+1,j)-v(i,j)$ , $v(i-1,j)-v(i,j)$ , $v(i,j+1)-v(i,j)$ , $v(i,j-1)-v(i,j)$ .
Efficiency: To enhance the speed of computation, an alternative version with sparse matrix representation is also created with csc_matrix from scipy. The comparison in terms of the runtime is shown in this table.

Here are the test results of this algorithm for the given images:


Source	Target
Fig.2
No. of iterations: R:282,G:285,B:286
Residuals (1-Norm): R:2.50,G:2.42,B:2.39

We can tell from Fig.2 that the two images are blended well without any evident defect. And more results from some other downloaded images are shown below:


Source	Target
Fig.3
No. of iterations: R:237,G:237,B:237
Residuals (1-Norm): R:2.83,G:1.91,B:1.78


Source	Target
Fig.4
No. of iterations: R:509,G:514,B:515
Residuals (1-Norm): R:0.53,G:0.31,B:0.27


Source	Target
Fig.5
No. of iterations: R:496,G:497,B:496
Residuals (1-Norm): R:3.74,G:3.01,B:2.74


Source	Target
Fig.6
No. of iterations: R:315,G:312,B:306
Residuals (1-Norm): R:5.16,G:5.18,B:5.09

We can acquire decent blending results from Fig.3 and Fig.4. However, the resulting images are not as good in Fig.5 and Fig.6. The background in region $S$ looks blurred. The major cause is that the gradients/textures in the target image are mostly larger than those in the source image. When blended, the gradients from the source images are preserved. This makes the resulting image blurred in $S$ . It can be concluded that the Poission blending is not suitable in cases where the gradients in the source image are smaller than those in the target image. One possible way to resolve this issue is shown in Bells & Whistles.

Bells & Whistles: Mixed Gradients

From the aforementioned test results, we found a limitation of the Poisson blending. An better option is to preserve the gradient information as the larger one between the source and target image. In this way, we can keep all the evident textures in both images and avoid the generation of the blurred results.

The details of my implementation are:

Objectives
- $L = \sum_{i\in S,j\in N_i\cap S}((v(i)-v(j))-d_{ij})^2+\sum_{i\in S,j\in N_i\cap \neg S}((v(i)-t(j))-d_{ij})^2$
  here, $d_{ij}=s(i)-s(j)$ , if $|s(i)-s(j)|\ge|t(i)-t(j)|$
  $d_{ij}=t(i)-t(j)$ , if $|s(i)-s(j)|<|t(i)-t(j)|$
Conditions
- $v(i)=t(i)$ , for $i\in\neg S$ . We just directly copy the values from the target image if it is outside of region $S$ .
Solver: The lsqr solver from scipy package. It is a iterative solver for least square problems.
Gradient Calculation: Forward difference for all 4 directions. $v(i+1,j)-v(i,j)$ , $v(i-1,j)-v(i,j)$ , $v(i,j+1)-v(i,j)$ , $v(i,j-1)-v(i,j)$ .
Efficiency: To enhance the speed of computation, an alternative version with sparse matrix representation is also created with csc_matrix from scipy. The comparison in terms of the runtime is shown in this table.

Here are the test results of this algorithm for some used images:

Fig.7

No. of iterations: R:310,G:306,B:302

Residuals (1-Norm): R:7.83,G:7.85,B:7.73

Fig.8

No. of iterations: R:227,G:230,B:232

Residuals (1-Norm): R:5.78,G:3.87,B:3.27

Fig.9

No. of iterations: R:482,G:483,B:483

Residuals (1-Norm): R:8.26,G:6.72,B:6.44

It can be seen from all 3 comparisions that the blurred region in the Poisson blending results are greatly resolved. The resulting blending images are more natural because of the preserved textures. Although, there are still visual defects in Fig.9. The horizon line is hidden by the moon. A potential strategy to tackle this problem is that we can define a threshold value $Th$ based on the intensity of the target region. If $t(i)<Th$ , we calculate the gradient as $v(i)-t(j)$ . I tested my hypothesis with the setup below:

The background image is converted to LAB space so that the L channel can be exacted to obtain the brightness for each pixel.
$Th=5\%\times max(L)$

The resulting blending image is shown below. We can tell that it is visually improved compared with the original mix gradients.

Fig.10

Bells & Whistles: Color2Gray

When converting the color image to grayscale, we usually lose the important information from color contrast, which makes it difficult to understand the resulting image. In this part, we want to smartly generate a grayscale image based on the original grayscale image with the gradients information from certain channel preserved.


Color image	Grayscale from cv2 package

First of all, we need to choose which channel we should use to obtain the gradient information. Here, we convert the original RGB image to HSV and run the Sobel filter on each channel. The obtained gradient images are:


H gradients	S gradients	V gradients

Using the cv2 grayscale image as background, we choose each of the H,S and V channel as the foreground and generate the blending image with the mixed gradient method decribed above. Among the results, the one using V channel yields to the best performance. We can tell from the results below that "35" is still recognizable after my grayscale convertion.

Interestingly, I also found an acceptable result when choosing the S channel as the background and V channel as the foreground. Although the digits are also recognizable, the textures/gradients in the digit region are not perfectly preserved.

Using the same strategy described above, some more test results are shown below:

We can tell from the results that the resulting grayscale images are decent when the digits are colorful but the background is less saturated. However, for some downloaded images with both colorful digits and background, the digits are not as recognizable.

Runtime

Operations	Regular Storage(s)	Sparse Storage(s)
Toy problem	$9.49$	$0.49$
Possion Blending	$285.50$	$2.04$
Mixed Gradients	$287.09$	$2.47$
Color2Gray	N/A	$0.89$

Acknowledgement

The basic methods are inspired by CMU 16-726 and Berkeley CS194-26.