Zijie Li | zijieli@andrew.cmu.edu

Overview

The main objective of this project was to take black and white glass plate images, captured by Russian photographer Sergei Mikhailovich Prokudin-Gorskii, align and merge them to create color images. In the main part, I will illustrate how I align the images and show some results. In Bells and Whistles, I will desribe the techniques that I employed to enhance the image quality.

Unprocessed BGR Image Sample

cathedral

Methods

For figures with relatively small size, we can align the images using a brute-force strategy. We check all possible translation vector [dy, dx] where dy and dx are integer between -20 and 20. (-15 to 15 is not precise enough for some figure like sell_portrait.tif) We can use two simple metrics to determine if a translation vector is optimal, sum of squared distance (SSD) and normalized cross-correlation (NCC).

SSD:$$ \sum_i\sum_j((I^{(1)}_{i,j}-I_{mean}^{(1)})- (I^{(2)}_{i,j}-I_{mean}^{(2)}))^2 $$ NCC:$$\frac{\sum_i\sum_j(I^{(1)}_{i,j}-I_{mean}^{(1)})\cdot (I^{(2)}_{i,j}-I_{mean}^{(2)})} {\sqrt{\sum_i\sum_j(I^{(1)}_{i,j}-I_{mean}^{(1)})^2\cdot (I^{(2)}_{i,j}-I_{mean}^{(2)})^2}} $$

In addition, before comparing the images to be aligned, I mannually cropped a specific percent of the boundary (10%), i.e., only a specific part of the image is taken into consideration. This noticebly improve the performance as it rules out most of the influence from image border.
For large images, I resize the image into a smaller one with half width and height and do this recursively. Then I stacked the resized images as a pyramid and search optimal translation vector from top (smallest image) to bottom (largest). After getting the optimal translation vector [dy, dx] from n-th layer, for image at n+1-th layer, we only need to iterate through [2*dy-3, 2*dy+3] to search for new optimal dy and similar for dx. This significantly reduces computational costs. Here, I set the height of pyramid to 4. Below are some successful colorized cases. The translation vector of each channel are arranged as [dy, dx], where dy indicates displacement on vertical axis and dx is on the horizontal axis.

Cathedral, using NCC (G:[5,2], R:[12,3])	Cathedral, using SSD (G:[5,2], R:[12,3])
Icon, using NCC (G:[41,17], R:[89,23])	Icon, using SSD (G:[41,17], R:[89,23])
Lady, using NCC (G:[51, 9], R:[112,12])	Lady, using SSD (G:[51,9], R:[112, 12])

Below are results of other images provided in the data folder, using NCC as metric.

Gallery

Harvesters (G:[59, 16], R:[123, 13])	Self portrait (G:[78, 29], R:[176, 37])	Three generations (G:[53, 14], R:[111, 11])
Train (G:[42, 5], R:[87, 32])	Turkmen (G:[56, 21], R:[116, 28])	Village (G:[64, 12], R:[137, 22])
	Emir (G:[49, 24], R:[71, 41])

My own choices

Camp on the mountain (G:[40, -5], R:[108, -32])

Archway (G:[59, -16], R:[130, -33])

Sunset (G:[75, -41], R:[113, -67])

Melon vender (G:[81, 10], R:[178, 13])

In general, using either SSD or NCC will perform well on most images and they will give similar results. However, for images like Emir.tif, pixels in different channels have very different intensities distribution (different brightness). Thus SSD and NCC will not work well at those cases. Rather than using more advanced image alignment methods like keypoint matching based on some feature descriptor, a straighforward way would be measuring the gradient difference between two images, which is defined as: $$\sum_i\sum_j(I^{(1)}_x[i,j]-I^{(2)}_x[i,j])^2+ (I^{(1)}_y[i,j]-I^{(2)}_y[i,j])^2$$

Images using sum of squared gradient difference as metric

Emir, using NCC as alignment metric (G:[49,24],R:[71,41])

Emir, using gradient difference as alignment metric (G:[49,24], R:[105,41])

It can be seen that after applying gradient difference as metric of alignment, the performance is significantly improved.

Bells and Whistles

1. Auto Cropping

In the main part, I mannually define a ratio of image to consider to rule out the influence of image border. Instead of mannual cropping, notice that most borders of the images have pxiel intensities different from the main part of the images, we can come up with some automatic cropping strategy. To this end, I designed a auto cropping pipeline as below:
1. Set all the white pixels on the image boundary to black (I[I>250] = 0). This can make the image border looks more consistent (mostly black).
2. Apply Otsu threshold to the image. The intuition behind Otsu's method is finding two classes of pixels by maximizing the inter-class variance. Given that the noisy stuff on the border are very different from image content, this threshold method can help us identify which region to crop.
3. After thresholding, apply binary erosion to the image, this can filter out tiny blob and noisy points.
4. Find the leftmost column that has noticeble amount of pixels that are identified as second kind of pixel, marked it as the left boundary of cropped image (i.e. I_new = I[:, leftmost:]). Similarly, apply this to the rest three boundaries (right, top, bottom). Here I also limit the auto cropping function to crop no more than 0.4 percent of the original image, which help prevent some extreme cases from failure.

Origin, w/o cropping	After cropping
Origin, w/o cropping	After cropping
Cathedral, after applying thresholding	Emir, after applying thresholding

2. Auto contrasting

Contrasting generally means the intensity of pixels are distributing more evenly. Hence, if we want to make an image more contrast, we can achieve this by stretching out its pixel intensity distribution. A straightforward way would be histogram equalization. Given an image, histogram equalization improves its contrast by stretching out its original historgram distribution such that histograms will be distributed to all levels of pixel intensities (from 0 to 255).

w/o auto contrast	with auto contrast
w/o auto contrast	with auto contrast
w/o auto contrast	with auto contrast

3. Auto white balancing

In this project, I mainly tested two ways to do the auto white balancing.

Simple white balancing: The first method is quite simple, firstly we convert RGB image into grayscale, then we pick the brightest pixel [ix, iy], calculate its difference with 255 on R, G, B channels, [dR, dG, dB]. Then we add [dR, dG, dB] to the whole image. After addition, we normalize the image such that it only ranges from 0 to 255.

Robust white balancing: The second method is more complicated. The technique is proposed by Huo et al.[3]. The Algorithm estimates illuminant based on pixels' YUV coordinates, i.e. finding pixels that are close to gray threshold T: $(|U|+|V|)/Y < T$. Then we can use iterative method to adapt the gain on channels R and B given the estimated illuminant. Once U and V values are closed enough to neutral gray, we stop the searching.
(Note that here noisy stuff on the boundary will severely harm the performance of white balancing, thus the images must be cropped first)

Emir. Simple white balancing	Emir. No white balancing	Emir. Robust white balancing
Self portrait. Simple white balancing	Self portrait. No white balancing	Self portrait. Robust white balancing
Self portrait. Simple white balancing	Self portrait. No white balancing	Self portrait. Robust white balancing
Village. Simple white balancing	Village. No white balancing	Village. Robust white balancing

In general, we can observe that after white balancing, the image becomes colder and more gray. Simple white balancing strategy works for some of the cases (self_portrait), but generally performs worse than Robust White Balancing algorithms.

References

[1] https://en.wikipedia.org/wiki/Otsu%27s_method

[2] https://docs.opencv.org/3.4/d4/d1b/tutorial_histogram_equalization.html

[3] J. Huo, Y. Chang, J. Wang, and X. Wei, "Robust Automatic White Balance Algorithm using Gray Color Points in Images," 2006, pp. 541-546.

[4] http://web.stanford.edu/~sujason/ColorBalancing/robustawb.html

Assignment #1 - Colorizing the Prokudin-Gorskii Photo Collection

by Zijie Li (zijieli@andrew.cmu.edu)