Z 16726

Project 1 - Colorizing the Prokudin-Gorskii Photo Collection

Raw Prokudin-Gorskii

Naive Channel Combine

Combined and Processed

The Prokudin-Gorskii Photo Collection comprises of three channels of intensity images per scene. Each of the three channels correspond to a color channel red, green, or blue. However, the channels are not aligned, meaning that we have to align the channels together to create a good looking RGB image. First, aligning the channels pose a challenge since the images are high resolution, thus we implement a pyramid aligning algorithm. Second, images have borders, thus we implement an auto-cropping algorithm to crop intelligently. Finally, the naive colors are often washed out and not true to life, thus we explore color normalization techniques to increase contrast and true to lifeness of the images.

Pyramid Image Alignment

To align the three R, G, B channels, we perform exhaustive search within an offset range to calcuate the error between the channels. We match G and B channels onto R independently. Here is an explanation of our process, given two channels and x_range and y_range, we loop through all pairs of (x, y) offsets and compared the pixel wise error between the two channels. We use normalized correlation $$corr = \frac{channel_1}{||channel_1||} \cdot \frac{channel_2}{||channel_2||}$$ which is defined as dot product between two normalized channels. Here, we want to find the offset (x,y) that maximizes the correlation. Such offset would give as a good alignment under the assumption that the normalized intensities of the R, G, and B channels should be similar.

However, given an image that is 3209 by 3702 pixels, the optimal offest could be several hundred pixels in one direction. To exhaustively search through all pairs such that ${(x,y)| x, y \in [-n, n] }$ when max offset $n$ is several hundred would be very impractical computationally, taking hours. Therefore, we use a pyramid to find image offsets. First, we use either 4 or 5 layers for the pyramid depending on image size. Given an original resolution input image, we downsize it by a factor of $2^{layer}$ and then perform exhaustive image alignment on a much smaller space where the max offset would be $n=10$ or $n=5$. After finding the optimal offset at the lowest layer, we move up the layers, scaling the offset appropriately, and perform another set of exhaustive image alignment. Finally, we perform exhaustive image alignment on the full resolution image, but with a very small search space. This significantly speeds up the alignment process for high resolution images.

Gradient Aligment

Instead of comparing normalized correlation on the intensity channels, we can also compare the x and y gradients of the channels independently. However, the experimental optimal offsets are similar to matching image intensity directly; therefore, I picked matching image intensity as the best option.

Figure. Below are sample aligned images from the Prokudin-Gorskii collection. Offsets indicate offsets(green_x, green_y, blue_x, blue_y).

Aligned Cathedral
offsets(-7 -1 -12 -3)

Aligned Lady
offsets(-62 -4 -117 -12)

Aligned Self_portrait
offsets(-98 -8 -176 -37)

Automatic Cropping

You may have noticed the weird looking borders on the aligned images above. The border artifacts comes when channel do not align and edges of certain color channels appear before other color channels. The lacking of all three color channel information around the edges results in the border artifacts. We can naively crop 10% into the image to get rid of the artifacts, but we propose an intelligent automatic cropping algorithm. We define a cross channel error map that responds to misaligned borders and ignores correctly aligned regions. We define the error map as $$error = \begin{cases} (r-g)^2 + (r-b)^2 + (g-b)^2 & min(r,g,b) \geq .05 \\ 1 & min(r,g,b) < .05 \\ \end{cases} $$ a piece-wise function that punishes misalignment between pairwise color channels and also dark edges where color channel information is missing. It is crucial to include the term that penalizes dark color channels because when all chanenls are 0, then pixelwise l2 error cannot capture the edge artifact. Using pairwise color channel error is better than calculating the row or column variance of color channels to find "similar colored lines" because the whit over-exposed sky is often an error case when cropping based on row or column variance. We crop a few more rows and columns on top of the discovered edge to further eliminate some artifacts. This can additional crop may be eliminated by tuning the threshold.

After constructing the error map, we simply loop through rows and columns from all 4 edges and filter by a threshold of 0.5. If the mean error of a row or column is greater than the threshold then we count that row or column as a border artifact and crop it out. We only search through $n$ number of boder rows and columns such that $n$ is ten percent of image size.

Figure. Below are visualizations of automatic cropping on harvesters.

Aligned Harvesters
offsets(-65 3 -124 -13)

Cropping Edges Visualization
rectangle(67, 136, height-158, width-201)

Cropped Harvesters

Color Correction

Finally, we fix the weird colors that comes from naively combining the color channels. Since we want the colors to look more close-to-life and increase the contrast, we use two different color normalization methods, 0-1 normalization to increase contrast and also ImageNet normaliaztion to recreate life-like colors. The best normalization method depends on the composition of the image. We find that images with a lot of background benefit from 0-1 normalization while images with more colors benefit from ImageNet normalization. We formally define 0-1 normalization as $img = \frac{img-min(img)}{max(img)-min(img)}$. Additionally, we define ImageNet normalization by normalizing the mean and variance for each color channel to the mean and variance from ImageNet.

Figure. Below are visualizations of color normalization. We see that ImageNet normalization reproduces more life-like colors and especially skintone.