Assignment #1 - Colorizing the Prokudin-Gorskii Photo Collection - 16-726 Learning-Based Image Synthesis / Spring 2021

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) [Сергей Михайлович Прокудин-Горский, to his Russian friends] was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar’s special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges… thousands of color pictures! His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later – he envisioned special projectors to be installed in “multimedia” classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again. Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, the three color channel images are extracted, placed on top of each other, and aligned so that they form a single RGB color image. A cool explanation on how the Library of Congress created the color images on their site is available here.

Implementation

The implementation began with a very simple approach that was based on exhaustive search of the displacement space in a range of [-30, 30] pixels. The image metric being used was the Sum of Squared Differences (SSD) or L2 norm. This was tested to work on the small (low-res) image provided as an example, as shown below. However, it took too long to run on the large, high-res images that were provided, and so a multi-scale approach was necessary to obtain faster convergence.

Total shift to align green channel with blue channel: X: 2 pixels, Y: 5 pixels

Total shift to align red channel with blue channel: X: 3 pixels, Y: 12 pixels

The multi-scale approach enabled results that looked like the following image. For multiscale, I scaled down the image until the height was below 300. I used a large search window of + or - 30 pixels for the smallest image, since those could be run quickly, and for the other larger images in the pyramid, I used an extra search window of + or - 3 pixels. Since they were already nearly aligned, a small shift to get the perfect alignment was sufficient at each stage.

Total shift to align green channel with blue channel: X: 9 pixels, Y: 57 pixels

Total shift to align red channel with blue channel: X: 12 pixels, Y: 120 pixels

However, this method alone was not sufficient for the purpose of aligning the Emir's image, as shown below.

The best way to fix this was to implement the extra credit portion that provided a suggestion for better features. I also changed the scoring metric to use normalized cross-correlation (NCC), which is simply a dot product between two normalized vectors: (image1./||image1|| and image2./||image2||). Here I used the absolute-gradient based image similarity instead of the RGB ones, and this provided a much better alignment result, as shown below.

Total shift to align green channel with blue channel: X: 24 pixels, Y: 49 pixels

Total shift to align red channel with blue channel: X: 41 pixels, Y: 106 pixels

Results

The results of the algorithm on the cathedral (cathedral.jpg), lady (lady.tif) and on the Emir (emir.tif) are as shown above. Subsequent results are as shown below.

Harvesters

Total shift to align green channel with blue channel: X: 16 pixels, Y: 60 pixels

Total shift to align red channel with blue channel: X: 11 pixels, Y: 124 pixels

Icon

Total shift to align green channel with blue channel: X: 17 pixels, Y: 42 pixels

Total shift to align red channel with blue channel: X: 23 pixels, Y: 90 pixels

Self-Portrait

Total shift to align green channel with blue channel: X: 29 pixels, Y: 78 pixels

Total shift to align red channel with blue channel: X: 37 pixels, Y: 175 pixels

Three Generations

Total shift to align green channel with blue channel: X: 13 pixels, Y: 54 pixels

Total shift to align red channel with blue channel: X: 8 pixels, Y: 111 pixels

Train

Total shift to align green channel with blue channel: X: -1 pixels, Y: 41 pixels

Total shift to align red channel with blue channel: X: 29 pixels, Y: 85 pixels

Turkmen

Total shift to align green channel with blue channel: X: 21 pixels, Y: 56 pixels

Total shift to align red channel with blue channel: X: 28 pixels, Y: 117 pixels

Village

Total shift to align green channel with blue channel: X: 10 pixels, Y: 64 pixels

Total shift to align red channel with blue channel: X: 21 pixels, Y: 137 pixels

Spires: new photo from the archives

Total shift to align green channel with blue channel: X: 22 pixels, Y: 20 pixels

Total shift to align red channel with blue channel: X: 30 pixels, Y: 45 pixels

Bells & Whistles (Extra Credit)

For the purpose of extra credit, I implemented some procedures described below:

Automatic cropping. For this purpose, I implemented an algorithm that converts the image to grayscale, then obtains the edges using the Canny Edge Detector algorithm, extracts lines of a particular length (very long) using the Hough Line Transform, then filters these lines on the basis of being parallel/perpendicular and finally uses the lines closest to the center of the image for the purpose of cropping. This works reasonably well and the result on the Emir's portrait is as shown below. However it does require heavy tuning and can fail for images with a large number of straight lines.
Better features. Instead of aligning based on RGB similarity, try using gradients or edges. This was implemented successfully and was used for alignment of the Emir's portrait as shown earlier on this page.