Overview

The goal of this project is to use some image processing techniques to automatically align the Prokudin-Gorskii glass plate images. To achieve this, I firstly implement the single-scale alignment based on the Sum of Squared Differences (SSD) distance for the low-resolution image. However, exhaustive search will become prohibitively expensive if the pixel displacement is too large, so I then implement the multi-scale pyramid alignment to solve the efficiency problem. In the end, I also try another feature for image alignment, for which I process the images with a sobel edge filter and then do the alignment on this output.

Single-scale Implementation

I implement the single-scale alignment based on the Sum of Squared Differences (SSD) distance for the low-resolution image (cathedral.jpg). See function align() in main_hw1.py for implementation details.

Aligned Image of cathedral.jpg

time cost: 0.69 s
Offset: G(1, -1), R(7, -1)
We could see that the alignment performance is very good but it takes too long to process the high-resolution images, so I implement the multi-scale version of it.

Multi-scale Pyramid Implementation

Exhaustive search will become prohibitively expensive if the pixel displacement is too large, so I then implement the multi-scale pyramid alignment to solve the efficiency problem. See function align_multiscale() in main_hw1.py for implementation details. The aligned results of all the given images are show as follows.

Aligned Image of harvesters.tif

time cost: 17.59 s
Offset: G(60, 16), R(124, 13)

icon

Aligned Image of icon.tif

time cost: 15.36 s
Offset: G(40, 17), R(89, 23)

lady

Aligned Image of lady.tif

time cost: 16.25 s
Offset: G(53, 8), R(117, 10)

self_portrait

ligned Image of self_portrait.tif

time cost: 15.81 s
Offset: G(78, 28), R(176, 36)

ligned Image of three_generations.tif

time cost: 12.01 s
Offset: G(54, 11), R(112, 9)

ligned Image of train.tif

time cost: 14.93 s
Offset: G(43, 5), R(87, 31)

ligned Image of turkmen.tif

time cost: 13.96 s
Offset: G(56, 19), R(115, 26)

ligned Image of village.tif

time cost: 16.32 s
Offset: G(64, 11), R(137, 21)

ligned Image of emir.tif

time cost: 15.66 s
Offset: G(49, 24), R(0, -200)
We could see that all the given images are aligned well except the "emir.jpg". The reason of this failure case is that the person in the image is wearing a blue coat, which results in a great difference of the blue and red channel in these areas. SSD cannot solve this problem when only using the RGB values. To solve this problem, we could use some other features like edges, which is implemented in the following Bells and Whistles section.

Bells and Whistles

It is not robust to align the images only using SSD on RGB values. Instead, we could firstly extract some more robust features from the original images and then do the alignment on the processed images. In this section, I firstly process the images with a sobel edge filter and then do the alignment on this output. The extracted edge features as well as the aligned images are shown as follows.

Extracted edges using sobel filter

Left is the result using the raw RGB values; Right is the result using the edge features

time cost 14.81 s
Offset: G(49, 23), R(107, 40)
We could see that we successfully solve the failure case in the last section when using only the raw RGB values.