**Gradient Domain Fusion**

*CMU 16-726 Spring 2021 Assignment #2*

Name: Juyong Kim

(#) Introduction In this assignment, we implement Poisson blending, a gradient-domain processing technique that seamlessly blends an object or texture from a source image into a target image. The insight behind Poisson blending is that people often care much more about the gradient of an image than the overall intensity. So the method tries to change the source image so that the gradient of the source image is maximally preserved, while overall intensity is matched to the target image. For the rest of the report, we first describe our method and implementation briefly. Then we show the result of the toy problem given by the assignment. The result of the blending including failure cases are shown following. (#) Method (##) Poisson Blending Poisson blending method from the instruction is as follows. Given the pixel intensities of the source image $\mathbf{s}$ and of the target image $\mathbf{t}$, we want to solve for new intensity values $\mathbf{v}$ within the source region $S$, which can be formulates as a least squares problem: $$ \mathbf{v} = \arg \min_\mathbf{v} \sum_{i \in S, j \in N_i \cup S} \{(v_i - v_j) - (s_i - s_j)\}^2 + \sum_{i \in S, j \in N_i \cup \neg S} \{(v_i - t_j) - (s_i - s_j)\} $$ where each $i$ is a pixel in the source region $S$, and $N_i$ is a 4-neighbor of $i$. Each summation guides the gradient values to match those of the source region. In the first summation, the gradients of the source image and the new image inside the region $S$ are compared. In the second summation, the gradients of the source image and the gradient made from the new and target image on the boundary of $S$ are compared. There are several considerations in running the algorithm. First, the offset, or position, of the source image to be blended should be decided. Although the offset is manually set, but there can be an algorithm to find the best offset so that the gradient at the boundary are similar between the source and the target image. Second, for the blending output to be naturally looking, the texture, or the pattern, of the two images at the boundary should be similar. Conceptually the method tries to preserve the gradient of the source image and the overall intensity of the source image can be changed towards the target image. If the texture of two images differ significantly, there can be still noticable change at the boundary. **Solving the Optimization** The objective function above can be formularized as a least-square problem $\arg \min_\mathbf{v} \|A\mathbf{v}-b\|_2$, where each summand and the last row become a row. However, the size of matrix $A$ is huge, even for the small image given by the assignment, solving it with the standard solution ($x = (A^\top A)^{-1}A^\top b$) takes long time. Fortunately, since the matrix is very sparse (only 2 element per row is non-zero), we can solve with sparse matrix least-square solver. Since we are dealing with RGB color images, we have to solve this optimization for each channel. Here, the set $S$ is same for all channels, so $A$ is common and only $b$ is needed to be computed differently. **Out-of-range Problem** After solving the linear equation, some values of the solution may go out of the range of image intensity ($[0.0, 1.0]$ in this case). We tried two methods to deal with the out-of-range values: Clipping and re-ranging. For clipping, we can clip the pixel values greater than $1.0$ to $1.0$, and similarity for the values lower than $0.0$. For re-ranging, we can apply linear point-processing so that the scale of each channel becomes $[0.0, 1.0]$. After trying several examples, we found that clipping is enough when the blending is done smoothly, and the re-ranging is benefical in color2gray (described below). (##) Toy Problem To get a sense how to solve a least-square problem $\arg \min_x \|Ax-b\|_2$, we start with a toy problem of recovering an image with differences of gradients and a boundary condition. The task here is that given an image $\mathbf{s}(x, y)$, we find the image $\mathbf{v}(x, y)$ that minimizes the following objective: $$ \mathbf{v} = \arg\min_\mathbf{v} \bigg[ \sum_{x\in[H],y\in[W]} \{ (v(x+1,y)-v(x,y)) - (s(x+1,y)-s(x,y)) \}^2 \hspace{80pt} $$ $$ \hspace{40pt} + \sum_{x\in[H],y\in[W]} \{ (v(x+1,y)-t(x,y)) - (s(x+1,y)-s(x,y)) \}^2 + \{v(0,0) - s(0,0)\}^2 \bigg]. $$ This optimization finds an image which the gradients in both horizontal and vertical direction matches and the first pixel intensity. We know the solution for $\mathbf{v}$ is the original image $\mathbf{s}$ making the objective function global minimum, zero. (#) Bells and Whistles (##) 1. Mixed Gradients Same as the poisson blending, we can give different gradients than the source gradient. Specifically, try the gradients of source or target image which has a larger absolute value. So the objective function becomes as follows: $$ \mathbf{v} = \arg \min_\mathbf{v} \sum_{i \in S, j \in N_i \cup S} \{(v_i - v_j) - d_{ij}\}^2 + \sum_{i \in S, j \in N_i \cup \neg S} \{(v_i - t_j) - d_{ij}\}, $$ $$ \text{where}~~~ d_{ij} = \begin{cases} s_i - s_j & ~~\text{if}~~ |s_i - s_j| \ge |t_i - t_j| \\ t_i - t_j & ~~\text{otherwise} \end{cases} $$ (##) 2. Color2Gray One application of gradient-domain processing is to convert a color image into a gray-scale image that preserve the contrast between colors. This is an example of tone-mapping problem, similar to that of converting HDR images to RGB displays. To solve this, we first convert the RGB image into HSV color space, and apply mixed gradient method to the S (saturation) and V (value) channels. Similarity to the toy problem, we add a squared difference between the first pixel of the output and the average of the S and V values of the first pixel. (#) Results (##) Part 1.1 Toy Problem Figure [toy_input] and Figure [toy_output] show the input and the output of the toy problem. The input is a small grayscale image with size $(110, 119)$. But the number of constraints are proportional to the number of pixel and the matrix $A$ of the least square problem becomes $25952 \times 13090$, which is intractable to solve with standard method. So the constraints are represented as sparse matrix `scipy.sparse.coo_matrix()` and the sparse matrix least-square solver `scipy.sparse.linalg.lsqr()` is used to solve the problem. ![Figure [toy_input]: Input](images/toy_problem.png width=200px) ![Figure [toy_output]: Output of `ssl.lsqr`
(identical to the input)](images/toy_problem_out.png width=200px) As a results, the least-square solver part takes only about 0.5 second (on 2018 MacBook Pro 15-inch, with 6-core 2.2GHz CPU) with the stopping tolerances `atol` and `btol` being $1e^{-20}$. In the floating point representation, the original image and the output have the maximum absolute difference of 6.5836e-14. When converted into the original data type (`uint`: 0 $\sim$ 255), the final output is identical to the input. (##) Part 1.2 Poisson Blending Figure [poisson_source] ~ Figure [poisson_juyong] show the inputs (source, mask, and target images) and the outputs of Poisson blending and naive copy & paste. While the naive output (Figure [poisson_rude], Figure [rude_juyong]) has noticable change of color at the boundary, the poission blending (Figure [poisson_output], Figure [poisson_juyong]) the source image naturally mixed into the target image. (###) Example 1. Images from the assignment ![figure [poisson_source]: Source](images/data/source_01.jpg) ![figure [poisson_mask]: Mask](images/data/mask_01.jpg) ![figure [poisson_target]: Target](images/data/target_01.jpg) ![figure [poisson_rude]: Output (naive)](images/outputs/rude_01.jpg) ![figure [poisson_output]: Output (Poisson)](images/outputs/poisson_01.jpg) (###) Example 2. Images of our own ![figure [source_juyong]: Source](images/data/source_juyong.jpg width=150px) ![figure [mask_juyong]: Mask](images/data/mask_juyong.jpg width=150px) ![figure [target_juyong]: Target](images/data/target_juyong.jpg width=338px) ![figure [rude_juyong]: Output (naive)](images/outputs/rude_juyong.jpg width=338px) ![figure [poisson_juyong]: Output (Poisson)](images/outputs/poisson_juyong.jpg width=338px) Figures below show a failure case of Poisson blending. In Figure [poisson_juyong2], the source image becomes too bright to recognize the details. This is because the source image at the border is too dark and the overall intensity of it should be increased to match the pixel intensity of the target image at the boundary. Also, we can see the change of texture at the boundary, which cannot be addressed by Poisson blending. (###) Example 3. Failure case ![figure [source_juyong2]: Source](images/data/source_juyong2.jpg width=130px) ![figure [mask_juyong2]: Mask](images/data/mask_juyong2.jpg width=130px) ![figure [target_juyong2]: Target](images/data/target_juyong2.jpg width=338px) ![figure [rude_juyong2]: Output (naive)](images/outputs/rude_juyong2.jpg width=338px) ![figure [poisson_juyong2]: Output (Poisson)](images/outputs/poisson_juyong2.jpg width=338px) (##) Bells & Whistles 1: Mixed Gradients One example of mixed gradient method explained earlier is shown in Figure [mixed_03]. While the Poisson blending output displays the same shape and texture of the source image inside the mask, the mixed gradient output shows the texture of both source and the target image. This is because the objective function of the mixed gradient takes the greater of the source gradient and the target gradient, rather than only source gradient. ![figure [poisson_juyong2]: Poisson blending](images/outputs/poisson_juyong.jpg) ![figure [mixed_juyong]: Mixed gradient](images/outputs/mixed_juyong.jpg) ![figure [poisson_03]: Poisson blending](images/outputs/poisson_03.jpg) ![figure [mixed_03]: Mixed gradient](images/outputs/mixed_03.jpg) (##) Bells & Whistles 2: Color2Gray The output of the color2gray method described earlier shown in the figures below. While the output of the grayscale conversion function `skimage.color.rgb2gray()` do not distinguish the background and the number, our results display them contrastively. This is because the number and the background has different intensity of either S or V value in HSV color space. ![figure [rgb_26]: RGB](images/data/colorBlindTest26.jpg) ![figure [gray_26]: `rgb2gray` output](images/data/colorBlindTest26_gray.jpg) ![figure [color2gray_26]: Output](images/outputs/color2gray_colorBlindTest26.jpg) ![figure [rgb_35]: RGB](images/data/colorBlindTest35.jpg) ![figure [gray_35]: `rgb2gray` output](images/data/colorBlindTest35_gray.jpg) ![figure [color2gray_35]: Output](images/outputs/color2gray_colorBlindTest35.jpg) ![figure [rgb_74]: RGB](images/data/colorBlindTest74.jpg) ![figure [gray_74]: `rgb2gray` output](images/data/colorBlindTest74_gray.jpg) ![figure [color2gray_74]: Output](images/outputs/color2gray_colorBlindTest74.jpg) (#) More Outputs ![Source](images/data/source_02.jpg) ![Mask](images/data/mask_02.jpg) ![Target](images/data/target_02.jpg) ![Output (naive)](images/outputs/rude_02.jpg) ![Output (Poisson)](images/outputs/poisson_02.jpg) ![Output (mixed)](images/outputs/mixed_02.jpg) -------------- ![Source](images/data/source_03.jpg) ![Mask](images/data/mask_03.jpg) ![Target](images/data/target_03.jpg) ![Output (naive)](images/outputs/rude_03.jpg) ![Output (Poisson)](images/outputs/poisson_03.jpg) ![Output (mixed)](images/outputs/mixed_03.jpg) -------------- ![Source](images/data/source_01.jpg) ![Mask](images/data/mask_01.jpg) ![Target](images/data/target_01.jpg) ![Output (naive)](images/outputs/rude_01.jpg) ![Output (Poisson)](images/outputs/poisson_01.jpg) ![Output (mixed)](images/outputs/mixed_01.jpg) -------------- ![Source](images/data/source_juyong3.jpg width=150px) ![Mask](images/data/mask_juyong3.jpg width=150px) ![Target](images/data/target_juyong3.jpg) ![Output (naive)](images/outputs/rude_juyong3.jpg) ![Output (Poisson)](images/outputs/poisson_juyong3.jpg) ![Output (mixed)](images/outputs/mixed_juyong3.jpg)