Colorizing the Prokudin-Gorskii Photo Collection

Carnegie Mellon University, 16-726 Learning-Based Image Synthesis, Spring 2021

Null Reaper Logo
Null Reaper (Clive Gomes)

Task

For this project, we are given 3 color channel images and the objective is to place them on top of each other, and align them so that they form a single RGB color image. The original assignment can be found here.

Preliminary Discussion

For the purpose of this project report, we shall focus on the 'emir.tif' image which is shown below. As can be seen, the image comprises of three separate color channels (red, green and blue from bottom to top).

Emir Glass Plate Image
Figure 1: Emir Original Glass Plate Image

Before we get to the main portion of this project, we first split the 3 color channel images into separate pytorch tensors. This was done by dividing the image into 3 equal parts vertically. The result is the 3 images shown below.

Emir Red Channel Emir Green Channel Emir Blue Channel
Figure 2: Emir Image 3 Channels (red, green and blue from left to right)

Differing Brightness Issue

One thing that may be apparent at this point is that the 3 color channel images do not have the same level of brightness. This is clear from Figure 2 where clothing in the red channel image looks dark whereas the clothing in the blu color channel image looks light. Accordingly, if we simply try to align the color channels as is (using a metric like the "Sum of Squared Differences, i.e. SSD, distance"), we would not expect to get the best alignment. In fact, Figure 3 below shows the result of such an operation. Clearly, we need to use a smarter feature than simply using pixel values.

Emir Misaligned Image
Figure 3: Aligned Emir Channels using SSD on Raw Pixel Values

Edge Detection

To resolve the differing brightness issue, we shall first perform edge detection on the 3 color channels and then align these edges instead of the pixel values in the original images. We will start by applying a Gaussian blur on the 3 channel images. This was done to reduce the number of edges in the image, thus making it easier to highlight the more relevant edges (this idea was taken from here). The 3x3 Gaussian filter shown below (Figure 4) was convolved with the 3 channel images to obtain the blurred versions of these images.

Gaussian Kernel Image
Figure 4: 3x3 Gaussian Kernel

A set of custom convolution functions were implemented to perform this operation. In these functions, the 3 color channel images were first stacked together and then vectorized to have the windows overlapping with the kernel in separate columns. This 3D tensor was then multiplied with the Gaussian kernel and the columns were summed up to get the values for the convolved image. Finally, the 3 color channels were separated and devectorized into 2D images. (Note: This method was used because the initial implementation of the convolution function using "for" loops was too slow.) The result of the blurring operation is shown below (Figure 5).

Emir Red Channel Blurred Emir Green Channel Blurred Emir Blue Channel Blurred
Figure 5: Emir Image 3 Channels w/ Gaussian Blur (red, green and blue from left to right)

The effect of the blurring may not be clear in the Emir images since the image resolution is over 3000 pixels while the kernel is only 3 pixels wide. The image below (Figure 6) shows a more noticeable effect of this blurring option.

Cathedral Red Channel Original Cathedral Red Channel Blurred
Figure 6: Cathedral Red Channel Original Image (left) vs Blurred Image (right)

Now, we get to the edge detection task. We use the following 3x3 Sobel filters (Figure 7) for this (one for detecting horizontal edges and the other for verticle edges).

Sobel Horizontal Edge Detection Filter Sobel Verticle Edge Detection Filter
Figure 7: Sobel Edge Detection Filters — Horizontal (left) and Verticle (right)

We use the same convolution functions mentioned in the Gaussian blurring section, however, here we perform 2 convolutions (in the horizontal and verticle directions) and then take the RMS value for each pixel in the convolved images. The resulting edge-detected images for the 3 color channels are shown below (Figure 8).

Emir Red Channel Edges Emir Green Channel Edges Emir Blue Channel Edges
Figure 8: Emir Image 3 Channels w/ Edge Detection (red, green and blue from left to right)

Alignment

Finally, we can now align the edges in the 3 color channel images; here, we will align the red and green channel images to the blue channel image (using SSD). We could do an exhaustive search within a fixed window (for e.g. -15 to +15), but this would be an extremely slow operation, especially if the window is too large (as we will will see later, the best y-alignment for the red channel of the Emir image is around 107—this would require a very large window to search exhaustively). Accordingly, a pyramid search was performed to speed up the process. Here, we first scale the original channel images by a factor of 2, 4, 8 and so on (Figure 9), and start by aligning at the smallest resolution.

Emir 1x Resolution Emir 2x Resolution Emir 4x Resolution Emir 8x Resolution Emir 16x Resolution
Figure 9: Emir Edge Detected Red Channel at Different Resolutions (from left to right 1x, 2x, 4x, 8x and 16x)

Once we are back up to the original resolution, we add up the smaller alignments (by multiplying them with the appropriate factor of 2) to obtain the best alignment for each of the 3 color channel images. With these, we shift the original red and green color channel images (not the edge detected ones) by the obtained displacements. The final result for the Emir image is given in Figure 10.

Emir Aligned Image
Figure 10: Aligned Emir Channels using Pyramid Search w/ SSD on Edge Detected Images — Green Channel Displacement → X = 23, Y = 49 —— Red Channel Displacement → X = 40, Y = 107 —

Results for All Images within Project

cathedral.jpg aligned image
cathedral.jpg — Green Channel Displacement → X = -1, Y = 6 —— Red Channel Displacement → X = -1, Y = 13 —
emir.tif aligned image
emir.tif — Green Channel Displacement → X = 23, Y = 49 —— Red Channel Displacement → X = 40, Y = 107 —
harvesters.tif aligned image
harvesters.tif — Green Channel Displacement → X = -6, Y = 70 —— Red Channel Displacement → X = 4, Y = 123 —
icon.tif aligned image
icon.tif — Green Channel Displacement → X = 17, Y = 42 —— Red Channel Displacement → X = 23, Y = 90 —
lady.tif aligned image
lady.tif — Green Channel Displacement → X = -10, Y = -22 —— Red Channel Displacement → X = -21, Y = -10 —
self_portrait.tif aligned image
self_portrait.tif — Green Channel Displacement → X = -3, Y = 79 —— Red Channel Displacement → X = -2, Y = 117 —
three_generations.tif aligned image
three_generations.tif — Green Channel Displacement → X = 0, Y = 54 —— Red Channel Displacement → X = 7, Y = 107 —
train.tif aligned image
train.tif — Green Channel Displacement → X = 1, Y = 41 —— Red Channel Displacement → X = 29, Y = 85 —
turkmen.tif aligned image
turkmen.tif — Green Channel Displacement → X = 21, Y = 56 —— Red Channel Displacement → X = 27, Y = 117 —
village.tif aligned image
village.tif — Green Channel Displacement → X = -9, Y = 63 —— Red Channel Displacement → X = -15, Y = 115 —

Discussion on Results

As can be seen in the previous section, most of the output images look pretty good; the exceptions are "lady.tif" and "self_portrait.tif." Upon further inspection, it was discovered that the problem lied in the edge detection step. Take a look at the edge-detected color channels for the "lady.tif" image (Figure 11).

Lady Red Channel Edges Lady Green Channel Edges Lady Blue Channel Edges
Figure 11: Lady Image 3 Channels w/ Edge Detection (red, green and blue from left to right)

The thing one may notice is that there are hardly any edges in the above images as compared to the ones in the "Edge Detection" section earlier (this may be because the image doesn't have many sharp edges). The same was noted in the edge-detected color channels of 'self_portrait.tif.' To resolve this issue, the gaussian blurring step was skipped (to prevent the code from reducing the number of edges in the image). The results are given below (Figure 12 & 13).

Lady Image Original Alignment Lady Image Alignment w/o Gaussian Blurring
Figure 12: Lady Image Original Alignment (left) vs Alignment w/o Gaussian Blurring (right)
Self Portrait Image Original Alignment Self Portrait Image Alignment w/o Gaussian Blurring
Figure 13: Self Portrait Image Original Alignment (left) vs Alignment w/o Gaussian Blurring (right)

We see that the "lady.tif" image (Figure 12, right image) looks much better than earlier (though not perfect). However, the "self_portrait.tif" image hasn't changed much. Fixing this may require a more sophisticated edge-detection implementation. Instead, we completely bypass the edge-detection step and directly perform pyramid alignment on raw pixel values. The result now is much better than earlier (see Figure 14).

Self Portrait Image Alignment w/ Raw Pixel Values
Figure 14: Self Portrait Image Alignment w/ Raw Pixel Values

Results for Example Online Images

01100-01122a.tif aligned image
01100-01122a.tif — Green Channel Displacement → X = 48, Y = 47 —— Red Channel Displacement → X = 95, Y = 100 —
01300-01363a.tif aligned image
01300-01363a.tif — Green Channel Displacement → X = 54, Y = 33 —— Red Channel Displacement → X = 92, Y = 90 —
01800-01806a.tif aligned image
01800-01806a.tif — Green Channel Displacement → X = 17, Y = 65 —— Red Channel Displacement → X = 38, Y = 135 —
01800-01864a.tif aligned image
01800-01864a.tif — Green Channel Displacement → X = 7, Y = 52 —— Red Channel Displacement → X = 13, Y = 111 —

Bells & Whistles

→ Pytorch Implementation

For the implementation of this project, pytorch tensors were used instead of numpy arrays/python lists. This didn't affect the alignment of the images (except for maybe a slight difference in runtime) and so, the use of pytorch tensors must be verified by looking at the source code submitted along with the project.

→ Better Features (Gradients/Edges)

As discussed in the prior sections of this report, our implementation allows for the use of both raw pixel values (see "Differing Brightness Issue" section) and edges for the alignment task (see "Edge Detection" section). It has also been shown that using the edge features provided much better results, except in rare cases (see "Discussion on Results" section).