**Colorizing the Prokudin-Gorskii Photo Collection**
*CMU 16-726 Spring 2021 Assignment #1*
Name: Juyong Kim
(#) Introduction
In this assignment, we aim to align three channels of digitized Prokudin-Gorskii glass plate images to produce a single color image using basic image processing techniques.
In 1918, [Sergei Mikhailovich Prokudin-Gorskii](http://en.wikipedia.org/wiki/Prokudin-Gorskii) left photographs of glass plates, each of which has red-, green-, and blue-filtered images.
The Library of Congress has recently digitized these images, putting all channels in one image side by side.
To produce a color image we need to automatically crop and align these images of RGB channels.
First, we establish a basic criteria of alignment, normalized cross-correlation (NCC), and try different methods of alignment search.
Since the size of the original input image is huge, brute-force is not feasible and iterative search and pyramid search work.
Second, we try using the sobel gradients instead of the raw channel images and crop borders automatically.
Both additional attempts improve the quality of final products.
**************************************************************************************
*.----------------. .----------. .---------------. .------. .---------------.*
*| Single channel |___| Sobel |___| NCC Align w. |___| Auto |___| RGB color |*
*| images (input) | | Gradient | | Image Pyramid | | Crop | | image (output)|*
*'----------------' '----------' '---------------' '------' '---------------'*
**************************************************************************************
(#) Method
(##) Matching Criteria
As suggested in the instruction, we align the R and G channels to the B channel, _i.e._ find the vertical and horizontal adjustment of the R and G to the B.
We choose the normalized cross-correlation (NCC) as the matching criteria.
Given two single channel images $f(x, y)$ and $g(x, y)$ of size $H \times W$ and $-D\le u,v \le D$, $\mathrm{NCC}(u, v)$ computes how well two images are aligned when the first image moved by $(u, v)$:
$$
\mathrm{NCC}(u, v) = \frac{\sum_{(x,y)\in N} \left[\{f(x+u, y+v) - \overline{f}\} \{g(x, y) - \overline{g}\}\right]}{ \sqrt{\sum_{(x,y)\in N} \{f(x+u, y+v) - \overline{f}\}^2}\sqrt{\sum_{(x,y)\in N} \{ g(x, y) - \overline{g} \}^2} },
$$
where $\overline{f}$ and $\overline{g}$ is the average intensity of $f$ and $g$, respectively, and $N$ is a set of pixels in $g$ that two images overlapped.
The choice of $N$ can be the set of all pixels that are overlapped, but in that case the size of $N$ can differ for different values of $(u, v)$.
Instead, we choose $N = [D,H-D] \times [D,W-D]$, which is the region that can be overlapped for all cases.
Now the problem of finding the best alignment is to find the adjustment $(u, v)$ that gives the highest value of NCC:
$$
(u^*, v^*) = \underset{(u,v) \in [-D,D]^2}{\mathrm{argmax}} ~\mathrm{NCC}(u, v)
$$
(##) Search Methods
To find the best alignment, we try three methods of search: 1) brute-force, 2) iterative, and 3) image pyramid.
For all methods, we choose $D=\lfloor H/10 \rfloor$.
**Brute-force** The brute-force method simply computes the NCC for all possible values of $(u, v)$.
It finds exact solution for the problem, but this method works only for small-size images, not for large images (both width and height > 3,000).
The time complexity of $\mathrm{NCC}(u, v)$ is $O(HW)$ for each value of $(u, v)$ and the size of search space ($O(D^2)$) also increases as the image size increases.
Assuming $D, W = O(H)$, the time complexity of the brute-force search is $O(H^4)$, which makes the search intractable for large images.
**Iterative search** Instead of the brute-force search, we can iteratively search small regions of the search space, moving the region to the point $(u, v)$ that yields the highest NCC value.
We call this method "iterative search".
This method is inspired by gradient descent search and assumes the convexity of the NCC w.r.t. the search space.
We choose the search region per step to be the region of $+5 \sim -5$ pixels around the maximum NCC point and the search stops when the maximum NCC point does not change after the search step.
Assuming the number of search step to be linear to the image size $O(H)$, the time complexity of the method is $O(H^3)$.
**Image pyramid** The final search method uses image pyramid, a multi-scale representation of image, each of which differs by resolution scale of 2.
The method recursively calls itself with smaller scale of images, and searches the NCC values only around the pixels around the point returned by the recursion.
The recursion falls back to the brute-force search if any side of the images becomes smaller than 200.
The time complexity of the method is $O(H^2)$, most of which is taken by the first (outermost) recursion.
(#) Bells & Whistles
In this section, we describe two attempts to improve the results.
(##) 1. Better Representation - Gradient
Instead of the pixel values of each channel, we try using the gradients to align the channels.
We use sobel operator to extract the gradient and the rest of the alignment remains the same.
The output of the sobel operator is the magnitude of the gradient, so the alignment of different color channels can be performed effectively than the raw values.
This is because changes in the color images results in positive values for the all gradient channels, but patterns of change in the raw pixel spaces can be different (positive or negative) among raw channels.
(##) 2. Automatic Cropping
The glass plates have black borders and also the digitized images use white background which renders outer white borders.
To remove the borders we first detect the location of black borders (the inner borders) and then crop each side of the image by the tightest border, considering the alignments.
We apply simple rule to detect the black borders.
For each row and column of a single channel image, we count the number of black pixels (pixels with intensity < 0.2).
And then we considered a row or a column as a black line if more than 80 percents of pixels are black.
For each side of the image, we choose the innermost black line to be the border of the image.
This rule is effective but makes assumptions: the image is vertically aligned and the inner borders has low values of pixels.
(#) Results and Discussion
(Click images to see larger)
(##) 1) The basic Method
**Brute-force search** With the brute-force search method, we can convert only the small image of cathedral.
Figure [cathedral] and shows the color image of cathedral by just stacking the channels without alignment.
Figure [cathedral_bt] shows the result of the brute-force method on the earlier image.
![Figure [cathedral]: Input (cathedral)](images/cathedral.jpg width=310px) ![Figure [cathedral_bt]: Output of the brute-force method (CPU time: 24.26s)](images/cathedral_bt.jpg width=310px)
**Iterative and image pyramid Search** For larger images, we can run either iterative search or image pyramid search.
Figure [train_iter] and Figure [train_pyramid] show the output of the two method on a large input image.
We can see that the both method works well on aligning channels.
![Figure [train]: Input (train)](images/train.jpg width=215px) ![Figure [train_iter]: Iterative search (CPU time: 731.13s)](images/train_image_iterative.jpg width=215px) ![Figure [train_pyramid]: Image Pyramid (CPU time: 43.85s)](images/train_image_pyramid.jpg width=215px)
We compare the running time of search methods on a small and a large image.
We measure the CPU time which is obtained by running `time` command and adding the `user` time and the `sys` time of the output.
Note that actual running time would be smaller on multi-core setting since python libraries utilize multiple cores for array/image operations.
For example, the iterative search and image pyramid search on the large image (`emir.tif`) takes only 159.79s and 13.60s when running on a 6-core CPU (2018 MacBook Pro 15), respectively.
Table [elapsed_time] shows the measured CPU times of different method.
We can see that on small image, the iterative and image pyramid search methods take much smaller time than the brute force method.
Also, it is notable that the iterative search method runs faster on the small image, but it runs much slower on the large image.
We guess that this is because image pyramid search has a fixed computational overhead, such as image resizing and the fallback method, but it has smaller time complexity than the iterative search.
Search Method | Small Image (`cathedral.jpg`) | Big Image (`emir.tif`)
-------------------|--------|---------
**Brute-force** | 24.26s | N/A
**Iterative** | 2.99s | 892.91s
**Image Pyramid** | 3.72s | 42.23s
[Table [elapsed_time]: CPU time for different search methods (measured by `man` command).]
**Failure case** Figure belows show a failure case of the basic methods.
For this specific input in Figure [emir] (emir), both the iterative and image pyramid search fail to align the red channel correctly, which arises the need of better method.
![Figure [emir]: Input (emir)](images/emir.jpg width=215px) ![Figure [emir_iter]: Iterative search (CPU time: 884.45s)](images/emir_image_iterative.jpg width=215px) ![Figure [emir_pyramid]: Image Pyramid (CPU time: 39.62s)](images/emir_image_pyramid.jpg width=215px)
(##) 2) Bells & Whistles: Better Representation
Figure [emir_image] and Figure [emir_gradient] show the results of alignment when raw images and gradient images are used for alignment.
With the NCC values of gradient images, the channels are aligned correctly on the failure case described above.
![Figure [emir_image]: Result with raw images](images/emir_image_pyramid.jpg width=310px) ![Figure [emir_gradient]: Result with gradient images](images/emir_gradient_pyramid.jpg width=310px)
(##) 3) Bells & Whistles: Automatic Cropping
The figures below show the results of automatic cropping performed on the aligned color image.
The input to the cropping is based on the best configuration (image pyramid search & gradient image).
We can see there are few artifacts remain at the output (see the right edge of Figure [lady_crop]), and we think this is caused by failure to detect borders when they are corrupted.
![Figure [lady_nocrop]: Before automatic cropping](images/lady_gradient_pyramid_nocrop.jpg width=310px) ![Figure [lady_crop]: After automatic cropping](images/output_gradient_crop/lady.jpg width=310px)
![Figure [portrait_nocrop]: Before automatic cropping](images/self_portrait_gradient_pyramid_nocrop.jpg width=310px) ![Figure [portrait_crop]: After automatic cropping](images/output_gradient_crop/self_portrait.jpg width=310px)
(#) Final Outputs
Below are the final product of our methods on all the input images with the offset of green and red channels (click the images to see larger).
![input: `cathedral.jpg`, green: (5, 2), red: (12, 3)](images/output_gradient_crop/cathedral.jpg) ![input: `emir.tif`, green: (49, 24), red: (107, 40)](images/output_gradient_crop/emir.jpg)
![input: `harvesters.tif`, green: (60, 17), red: (124, 14)](images/output_gradient_crop/harvesters.jpg) ![input: `icon.tif`, green: (42, 17), red: (90, 23)](images/output_gradient_crop/icon.jpg)
![input: `lady.tif`, green: (56, 9), red: (120, 13)](images/output_gradient_crop/lady.jpg) ![input: `self_portrait.tif`, green: (78, 29), red: (176, 37)](images/output_gradient_crop/self_portrait.jpg)
![input: `three_generations.tif`, green: (54, 12), red: (111, 9)](images/output_gradient_crop/three_generations.jpg) ![input: `train.tif`, green: (41, 2), red: (85, 29)](images/output_gradient_crop/train.jpg)
![input: `turkmen.tif`, green: (57, 22), red: (117, 28)](images/output_gradient_crop/turkmen.jpg) ![input: `village.tif`, green: (64, 10), red: (137, 21)](images/output_gradient_crop/village.jpg)