16726 - Learning Based Image Synthesis - Spring 2020

Tarang Shah (Andrew ID: tarangs)

Homework 1 - Aligning Separately Captured R,G, and B channels

This homework is related to aligning separate channels of an image to create a final color image. We use images from the Prokudin-Gorskii Collection which were captured using different colored glass plates between 1905 and 1915. All images from the original negatives have been digitized by the US Library of Congress and are available here.

teaser

Approach and Methodology

We use an approach that involves a metric to quantify the quality of the match, searching across different displacements to calculate offsets and then finally using these offsets to create a final RGB image.

The metric we use to quantify the quality of the match is Normalized Cross Correlation. Ideally we would probably do the NCC calculation on the full image, but this would create a lot of computations, especially when the maximum image resolutions are in the 1000s. Also the search space can also quickly get really large, which would make the whole process really slow.

We carry our certain optimizations to ensure the alignment process happens efficiently and gives results really fast.

The optimizations I implemented include

Cropping out parts from the edge for the NCC calculation - In this case, I use a patch at the center of the image to compare across the channels The patch is roughly 80% of the total width of the image. This can be further reduced as well, but not too much as it may affect the NCC comparison(as we remove information from the part of the image used to compare the alignment) and might result in a suboptimal alignment.
Using a multi scale pyramid structure to speed up the matching logic - As recommended in the homework writeup, I implemented a multiscale pyramid. In this, we downsample the image multiple times until we reach a feasable resolution. Once we reach a feasable resolution, we run the basic alignment process and calculate the offset. After this, we use these offsets as a starting point for the higher scaled images. For each layer of the pyramid, the resolution is halved until the resolution reaches below 300pixels.