Assignment #1 - Colorizing the Prokudin-Gorskii Photo Collection

Tarasha Khurana (Andrew ID: tkhurana)

Overview

The objective of the assignment is to take the glass plate images from the Prokudin-Gorskii Photo Collection and using image processing techniques, align the three separate color channels in the provided images to produce a single color image with as few alignment artifacts as possible.

In the standard case, where the size of these image collections is small, a simple "find-displacement" alignment model is expected to work. But for the same task for high-resolution images, we need to design a scalable and efficient alignment approach.

Approach

I considered the first, second and third segments in the glass plate images as the B, G and R channels. Next, I computed the alignment offsets for the R and G channels so that they were aligned to the B channel after cropping out the borders of the image (where border = 15% of the image dimension).

For the low resolution image, I did this by simply traversing through a [-15, 15] window of displacement offsets in the x and y directions. For each set of x and y offsets, I computed the sum of squared pixel differences (SSD) on the RGB candidate images. I stored the image with the lowest value of this error and report the displacements for this image.

For the high resolution images, I used the protocol designed above recursively. First, I created an image pyramid with $max\_levels = min(log(h, 2), \ log(w, 2)) - 5$ , as this gave me a satisfactory trade-off between the speed and accuracy of alignment. At each level, I decreased size of the image by a factor of 2. At each level starting from the lowest sized image, I searched in a [-15, 15] window of x and y displacements. At the next pyramid level, I doubled the displacement found from the previous level, shifted the R/G channel by this much and then again computed the displacement offset in the [-15, 15] window and so on.

Roadblocks Unfortunately, I spent a considerable number of hours in fixing a 1-letter bug in the code. Apart from this, the emir.tif image did not align properly because of the drastic difference between the pixel intensities of different channels. Owing to this, I implemented the Bells & Whistles.

Results

G[24, 48], R[-864, 0] (check Bells and Whistles)

Results on other images from the collection

Bells and Whistles

Better features Instead of aligning all images with the SSD metric on pixel intensity values, I computed the edge images using the Canny Edge detector in OpenCV and used the same alignment protocol. So the image pyramid was now of edge images instead of pixel intensities. The one example emir.tif that could not be aligned properly using the normal alignment protocol above now aligns properly. I'm omitting other results as they look similar.