**CMU 16-726 Learning-based Image Synthesis**
**Assignment #1**
*Title: "Colorizing the Prokudin-Gorskii Photo Collection"*
*Name: Soyong Shin (soyongs@andrew.cmu.edu)*
(##) Contents
* Image cropping
* Image pyramids
* Alignment method
* Results
(##) Image cropping
Auto-cropping code is implemented to remove border of the image. For this implementation, I divided this task into two steps.
**Step #1: Remove white background**
Given images are basically scanned images with white background around the pictures. To remove this, I first filtered the image to extract the background and removed the region.
All input images are assumed that photos are placed parallel to the image, thus, I averaged gray-scale image into horizontal and vertical axes extracted white region.
The implementation of this step is as below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Step 1: Remove white border img_h, img_w = img.shape # Filtering white background mean_w, mean_h = img.mean(0), img.mean(1) wborder_w, wborder_h = mean_w > 0.92, mean_h > 0.92 mask_w = np.where(wborder_w[1:] != wborder_w[:-1])[0] mask_h = np.where(wborder_h[1:] != wborder_h[:-1])[0] # Define white border start_w, start_h, end_w, end_h = 0, 0, img_w, img_h if len(mask_w) > 0: start_w = mask_w[0] if mask_w[0] < img_w * 0.04 else 0 end_w = mask_w[-1] if mask_w[-1] > img_w * 0.96 else img_w if len(mask_h) > 0: start_h = mask_h[0] if mask_h[0] < img_h * 0.04 else 0 end_h = mask_h[-1] if mask_h[-1] > img_h * 0.96 else img_h img = img[start_h:end_h, start_w:end_w] | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## Step 2: Remove black border img_h, img_w = img.shape # Line detection edges = canny(img) lines = probabilistic_hough_line(edges, line_length=min(img.shape)//20) lines = np.array(lines) lines_h = lines[lines.std(1)[:, 1] < min(img_h, img_w)/100] lines_w = lines[lines.std(1)[:, 0] < min(img_h, img_w)/100] hs, ws = lines_h[:, 0, 1], lines_w[:, 0, 0] # Extract border candidates close to the edge border_t = hs[hs < img_h * 0.04] border_l = ws[ws < img_w * 0.04] border_b = hs[hs > img_h * 0.96] border_r = ws[ws > img_w * 0.96] # Set the border that is most far from the edge among the candidates start_h = 0 if len(border_t) == 0 else border_t.max() end_h = img_h if len(border_b) == 0 else border_b.min() start_w = 0 if len(border_l) == 0 else border_l.max() end_w = img_w if len(border_r) == 0 else border_r.min() return img[start_h:end_h, start_w:end_w] | cs |
1 2 3 4 5 6 7 8 9 10 11 | ds_img = copy.deepcopy(img) pyramid_list = [ds_img] while math.prod(ds_img.shape) > th**2: ds_img = rescale(copy.deepcopy(ds_img), 0.5) if data_type == torch.Tensor: ds_img = torch.from_numpy(ds_img) pyramid_list += [ds_img] pyramid_list.reverse() return pyramid_list | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # img_B, img_G, img_R: color channels of the input image # crop_img_by_ratio: cropping image with default ratio (10% each) # pyramid_B, pyramid_G, pyramid_R: image pyramid of each channel crop_img_B = crop_img_by_ratio(img_B) crop_img_G = crop_img_by_ratio(img_G) crop_img_R = crop_img_by_ratio(img_R) pyramid_B = build_img_pyramids(crop_img_B) pyramid_G = build_img_pyramids(crop_img_G) pyramid_R = build_img_pyramids(crop_img_R) B_h, B_w = align_pyramids(pyramid_B, pyramid_G) R_h, R_w = align_pyramids(pyramid_R, pyramid_G) | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # prev_losses: losses of previous searching # * will be discused later # search_space_h, search_space_w: search space of two axes # translate_img: function to roll image with displacement # find_idxs: argmin function # least_h, least_w: optimal displacement if prev_losses is not None: losses[1:-1, 1:-1] = prev_losses for idx_h, d_h in enumerate(search_space_h): for idx_w, d_w in enumerate(search_space_w): if losses[idx_h, idx_w] != 0: continue losses[idx_h, idx_w] = loss( translate_img(img1, d_h, d_w), img2) least_idxs = find_idxs(losses == losses.min()) least_h = search_space_h[least_idxs[0][0]] least_w = search_space_w[least_idxs[1][0]] | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # r_h, r_w: search space range # prev_losses: losses of previous searching # curr_d_h, curr_d_w: optimal displacement of current searching if level == 0: r_h, r_w = [int(size/2) for size in img1.shape] elif (img1.shape[0] + img1.shape[1])/2 < 300: r_h, r_w = 5, 5 else: r_h, r_w = 1, 1 while True: curr_d_h, curr_d_w, prev_losses = get_optimal_displacement( translate_img(img1, d_h, d_w), img2, r_h, r_w) if prev_losses is None: break r_h += 1 r_w += 1 | cs |