16-726 Learning-Based Image Synthesis

Project 1: Colorizing the Prokudin-Gorskii Photo Collection

Trung Nguyen

Objective

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible.

Approach

To align the RGB channels, I first aligned the Red channel with the Blue channel, and then align the Green channel with the Blue channel. The aligned Red and Green channels would finally be pack with the original Blue Channel for the output image.

To align two given channels, at first I searched all possible displacement of [-15, 15] in both X and Y directions. Making it a total 31*31=961 total displacements to align two channels. For each alignment, I used the Sum Square Difference(SSD) to calculate the difference between two channels at a given alighment. The alignment with the lowest difference would be the best possible alignment with a given metric. To evaluate only important information of the image, I discarded the 10% margin on all four sides of the images to yield better results. This approach only worked for channels that has the best alignment within the range [-15,15] and would fail otherwise. The total displacements of 961 also make it extremely slow to run on higher resolution images.

As a result, to make the search more computationaly efficient and work on images that have shiftments much larger than [-15,15] range, I conducted the search on multiple scales with a factor of 2. Therefore, for each scale level, I only needed to search over a displacement of [-1,1] in both X and Y directions with the reasoning that for a given level i, we already searched the level (i + 1) that has half the size in both X and Y direction that search the displacemnt of [-2, 2] with a step of two if it were to search at this level. As a result, after getting the best alignment from the higher level, the current level only need to check the offset of [-1,1] in addition to the higher ones. This approach would allow to shiftment to go over [-2^n,2^n] in both X and Y direction where n is the number of levels. Each level only needs to check 3*3=9 alignments in total. With the total levels of 10 that allows the shiftment to within the range of [-1024, 1024] in the original scale, this approach is at least 10 times faster than the single scale search and allow a much more wider search space.

Extra credits

Edges features: Using pixel raw value to compute the metric with SSD is decent to get a good results on many images, but doesn't work well on some images as RGB value are sensitive to noise. Therefore, I use the Sobel filter to get the edges features of the images and used it to compute the score between two channels.