HW1

Jiaheng Hu (jiahengh)

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, we extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image. We implemented two approaches that achieves our goal. In approach one, we exhaustively search over a window of possible displacements of [-15,15] pixels, score each one using SSD distance, and take the displacement with the best score. We run this algorithm on cathedral.jpg, and obtained the following result (for all of the images we displayed, ar offset means the offset between red channel and blue channel, while ag offset means the offset between green channel and blue channel):


ar offset is (-12, -3)
ag offset is (-5, -2)

In approach two, we build a image pyramid to speed up the alignment process. I set the pyramid layer to 4, the initial search range to [-20, 20], and the search range of following layers to [-4, 4]. This greatly speed up the alignment, as it only takes about 10 second per image. The results are shown below:


ar offset is [-174 -3]
ag offset is [-77 1]

ar offset is [-137 -3]
ag offset is [-41 2]

ar offset is [-111 -9]
ag offset is [-52 -5]

ar offset is [-126 -13]
ag offset is [-58 -10]

ar offset is [-137 -3]
ag offset is [-64 3]

ar offset is [-119 0]
ag offset is [-55 3]

ar offset is [-90 -23]
ag offset is [-41 -16]

ar offset is [-114 -26]
ag offset is [-56 -5]

ar offset is [-113 -21]
ag offset is [-27 -8]

On most of the images, our algorithm works decently. However, on emir.tif and train.tif, our algorithm could definitely improve. I think the reason might be that when we are building our image pyramid, one of the important hyperparameter is how many times of down-scaling we should do. If this number is too small, we won't get much speed up; if it's too big, our low resolution alignment may be incorrect since the image is so blurred. It might be that I have down-scaled the image too many times. Another potential reason is that SSD might not be a good enough metric for measuring the alignment score across different channels, as they do not actually have the same brightness values. We might need some better metric to handle that.

As for a few examples of my own choosing, downloaded from the Prokudin-Gorskii collection, I presented two examples: one that aligned well and one that doesn't align that well:


ar offset is [-135 15]
ag offset is [-90 15]

ar offset is [-63 -6]
ag offset is [-22 -4]