16-726 21spring Assignment #1

Colorizing the Prokudin-Gorskii Photo Collection

Author: Zhe Huang (zhehuang)


1. Introduction

In this assignment, we took several glass plate images to formulate clearly-aligned RGB images. Sepcifically, we align each image's red and green channel with its corresponding blue channel, and then concatenate them as a 3-channel color image. In order to perfect align each image, we developed a method that searches over a window of possible displacements based on the Sume of Squared Differences (SSD) and finds the best displacement to produce aligned RGB images. To address the problem that some images are too large to do an exhaustive search, we took advantage of multi-level pyramid searching method that first does a coarse search on down-sampled images, and then does a more precise search using previous results as initial displacements.

In the following sections, I will first show the results I got from both single-scale and multi-scale channel alignment methods as required by the assignment. After that, I will show some of my bells & whistles that improve the whole image generation process, either in terms of speed or quality or both.

2.1 Single scale channel alignment

The single scale channel alignment searches a best dispacement vector using SSD as its metric. The dispacement vector is defined as (dx, dy), which will be used to align an image channel's pixel from (y, x) to (y + dy, x + dx) to achieve lowest SSD in a given search space. The search space by default is set between [-15, 15] for each axis. Note that dx is corresponding to the number of pixels to shift at the width axis and dy is at the height axis, whereas a pixel is represented by (y, x). The order is different. Also, this algorithm takes raw image channel as input, without extacting its gradients/edges.

Here is what it performs looks like on the provided "./data/cathedral.jpg".

2.2 Extra jpg images from LOC

3.1 Multi-scale channel alignment via pyramid

To solve the problem that some raw images are too large and doing exhaustive search over a large search space is computational infeasible, we develop a coarse-to-fine multi-scale channel alignment algorithm.

First, for each channel, we create a pyramid of channel, denoted as $Ch$, with different scales. Here for each image, the number of scales, $N$, is defined as $$ N = int(np.clip(\log_2(\frac{\min( Ch.shape)}{256}) , 1, +\infty)). $$

We downsample the channel by the factor of 2. That is, for the channel at pyramid level $i$ (1-indexed), $Ch_i$, it is computed as $$ Ch_i = rescale(Ch, \frac{1}{2^{i - 1}}). $$

For the most coarse level, the search space by default is set between [-15, 15]. For each level of pyramid $i$ other than the most coarse level, the search space for height is defined as $$ [-15 + 2^{i - 1}dy, 15 + 2^{i - 1}dy], $$ whereas the search space for width is defined as $$ [-15 + 2^{i - 1}dx, 15 + 2^{i - 1}dx], $$ where $dy, dx$ is from the displacement vector from one level below, respectively. The final dispacement vector is given after it aligns the level $1$, which corresponds to the orignal scale of the channel.

Here is what it performs looks like on provided images.

3.2 Extra tif images from LOC

4.1 Bells & Whistles: Better features

Instead of using raw image pixel inputs, we first blur the image using Gaussian kernel and then use sobel filter to extract the gradients. We align channels using gradients which are sparser and more robust to noise.

4.2 Bells & Whistles: Better transformations

Add rotation into search space to form the 3rd component of the displacement vector, dr. For the rotation component, we search three angles $\in \{-0.1, 0, 0.1\}$ to detect small rotational variations among channels. For multi-level pyramid alignment, this rotational displacement is parsed to the next layer as the inital guess as well.

Here are some results by adding rotations into the search space, which are better than our vanilla solution.