**CMU 16-726 Learning-based Image Synthesis** **Assignment #1** *Title: "Colorizing the Prokudin-Gorskii Photo Collection"* *Name: Soyong Shin (soyongs@andrew.cmu.edu)* (##) Contents * Image cropping * Image pyramids * Alignment method * Results (##) Image cropping Auto-cropping code is implemented to remove border of the image. For this implementation, I divided this task into two steps.

**Step #1: Remove white background**
Given images are basically scanned images with white background around the pictures. To remove this, I first filtered the image to extract the background and removed the region. All input images are assumed that photos are placed parallel to the image, thus, I averaged gray-scale image into horizontal and vertical axes extracted white region.

The implementation of this step is as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
## Step 1: Remove white border
img_h, img_w = img.shape
 
# Filtering white background
mean_w, mean_h = img.mean(0), img.mean(1)
wborder_w, wborder_h = mean_w > 0.92, mean_h > 0.92
mask_w = np.where(wborder_w[1:] != wborder_w[:-1])[0]
mask_h = np.where(wborder_h[1:] != wborder_h[:-1])[0]
 
# Define white border
start_w, start_h, end_w, end_h = 00, img_w, img_h
if len(mask_w) > 0:
    start_w = mask_w[0if mask_w[0< img_w * 0.04 else 0
    end_w = mask_w[-1if mask_w[-1> img_w * 0.96 else img_w
if len(mask_h) > 0:
    start_h = mask_h[0if mask_h[0< img_h * 0.04 else 0
    end_h = mask_h[-1if mask_h[-1> img_h * 0.96 else img_h
img = img[start_h:end_h, start_w:end_w]
cs


**Step #2: Remove black border**
Then black borders are detected and removed using hough transformation algorithm. Using canny line detection and probabilistic_hough_line algorithm implemented in **skimage** library, I first detect lines longer than 5% of short axis of the image. Then, border candidates are extracted that close enough to each edge and finally, border is defined as the closest line among the candidates. Here, I also assume that the photo is parallel to the image.

The implementation of the border detection and cropping is as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Step 2: Remove black border
img_h, img_w = img.shape
 
# Line detection
edges = canny(img)
lines = probabilistic_hough_line(edges, line_length=min(img.shape)//20)
lines = np.array(lines)
 
lines_h = lines[lines.std(1)[:, 1< min(img_h, img_w)/100]
lines_w = lines[lines.std(1)[:, 0< min(img_h, img_w)/100]
hs, ws = lines_h[:, 01], lines_w[:, 00]
 
# Extract border candidates close to the edge
border_t = hs[hs < img_h * 0.04]
border_l = ws[ws < img_w * 0.04]
border_b = hs[hs > img_h * 0.96]
border_r = ws[ws > img_w * 0.96]
 
# Set the border that is most far from the edge among the candidates
start_h = 0 if len(border_t) == 0 else border_t.max()
end_h = img_h if len(border_b) == 0 else border_b.min()
start_w = 0 if len(border_l) == 0 else border_l.max()
end_w = img_w if len(border_r) == 0 else border_r.min()
 
return img[start_h:end_h, start_w:end_w]
cs


**Results of cropping**
Sample results of the cropping algorithm is as below. Note that I manually colored white background as gray color to show the performance of this algorithm.
Input image output image    Input image output image

(##) Image pyramids In order to deal with large size image, image-pyramid is implemented. I used rescale function in *skimage.transform* to subsample an original image and added to the pyramid recursively. This is continued until the image size became smaller than 100x100. The original image is also included in the pyramid and I reversed the pyramid structure since the alignment should begin from the smallest image.

The implementation of image-pyramid is as below:
1
2
3
4
5
6
7
8
9
10
11
ds_img = copy.deepcopy(img)
pyramid_list = [ds_img]
 
while math.prod(ds_img.shape) > th**2:
    ds_img = rescale(copy.deepcopy(ds_img), 0.5)
    if data_type == torch.Tensor:
        ds_img = torch.from_numpy(ds_img)
    pyramid_list += [ds_img]
 
pyramid_list.reverse()
return pyramid_list
cs
(##) Alignment method Using given image pyramid, I implemented aligning algorithm that returns optimal displacement of one image matches with the other image. As our mission is to align 3 images (B, G, R channels), I alighned twice, one is aligining B image with G, the other is aligning R image with G.

The main aligning algorithm is implemented as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# img_B, img_G, img_R: color channels of the input image
# crop_img_by_ratio: cropping image with default ratio (10% each)
# pyramid_B, pyramid_G, pyramid_R: image pyramid of each channel
 
crop_img_B = crop_img_by_ratio(img_B)
crop_img_G = crop_img_by_ratio(img_G)
crop_img_R = crop_img_by_ratio(img_R)
 
pyramid_B = build_img_pyramids(crop_img_B)
pyramid_G = build_img_pyramids(crop_img_G)
pyramid_R = build_img_pyramids(crop_img_R)
 
B_h, B_w = align_pyramids(pyramid_B, pyramid_G)
R_h, R_w = align_pyramids(pyramid_R, pyramid_G)
cs


Aligning two image pyramids are done by level-wise alginment. The alignment began with the smallest images (level 0), adding the optimal displacement recursively. At each level, optimal displacement is multiplied by 2 since the size of images becomes twice.
$$ (U, V)_{opt, i+1} = 2 \times (U, V)_{opt, i} + (u, v)_{i} $$ where $ (U, V)_{i} $ is aggregated displacement until level $i$, and $(u, v)_i$ is local optimal displacement at level $i$. The local optimal displacement $(u, v)$ of two images $I_1, I_2$ is defined as: $$ (u, v)_{opt} := argmin_{(u, v) \in S} \hspace{2mm} L(I_1, I_2; u, v) $$ and loss function $L$ can be chosen among SSD and NCC. $$ SSD(I_1, I_2; u, v) := \sum_w \sum_h (roll(I_1; u, v)(w, h) - I_2 (w, h))^2 $$ $$ NCC(I_1, I_2; u, v) := -\frac{1}{WH} \sum_w \sum_h roll(I_1; u, v)(w, h) \cdot I_2 (w, h) $$ where $roll(\hspace{1mm} \cdot \hspace{1mm} ; u, v)$ is image translation function that rolls input image for $u, v$.

The implementation of searching optimal displacement at certain level is as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# prev_losses: losses of previous searching
#               * will be discused later
# search_space_h, search_space_w: search space of two axes
# translate_img: function to roll image with displacement
# find_idxs: argmin function
# least_h, least_w: optimal displacement
 
if prev_losses is not None:
    losses[1:-11:-1= prev_losses
 
for idx_h, d_h in enumerate(search_space_h):
    for idx_w, d_w in enumerate(search_space_w):
        if losses[idx_h, idx_w] != 0:
            continue
        losses[idx_h, idx_w] = loss(
            translate_img(img1, d_h, d_w), img2)
 
least_idxs = find_idxs(losses == losses.min())
least_h = search_space_h[least_idxs[0][0]]
least_w = search_space_w[least_idxs[1][0]]
cs


Ideally, we only need to search the displacement for $(-1, 0, 1)$ for the images at level higher than 0 because its resolution is twice than that of the previous level. However, sometimes (especially at the low levels with small image size) finds the optimal displacement out of $(-1, 1)$. This is because the optimal displacements of previous levels are not true value due to low resolution. Searching larger range for the optimal displacement can handle this issue, on the other hand, it requires longer computation time. To resolve this, I first searched the displacement within small range $(r_min, r_max)$ (e.g., $(-1, 1)$ or $(-5, 5)$) for the level higher than 0. If the optimal displacement is at the edge of the search space, I re-search optimal displacement with larger space $(r_{min}+1, r_{max}+1)$. To avoid searching same displacement twice, previous search result is stored and only the outer space is used for searching. This strategy significantly improves the computational efficiency.

The implementation of this multi-stage search is implemented as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# r_h, r_w: search space range
# prev_losses: losses of previous searching
# curr_d_h, curr_d_w: optimal displacement of current searching
 
if level == 0:
    r_h, r_w = [int(size/2for size in img1.shape]
elif (img1.shape[0+ img1.shape[1])/2 < 300:
    r_h, r_w = 55
else:
    r_h, r_w = 11
 
while True:
    curr_d_h, curr_d_w, prev_losses = get_optimal_displacement(
        translate_img(img1, d_h, d_w), img2, r_h, r_w)
    if prev_losses is None:
        break
    r_h += 1
    r_w += 1
cs
(##) Results Results for the implemented algorithm are as below. These results are based on NCC metric, however, SSD metric shows similar output as well.
Cathedral Emir

Harvest Icon

Lady Self-protrait

Three-generations Train

Turkmen Village