Task
For this project, we are given 3 color channel images and the objective is to place them on top of each other, and align them so that they form a single RGB color image. The original assignment can be found here.
For this project, we are given 3 color channel images and the objective is to place them on top of each other, and align them so that they form a single RGB color image. The original assignment can be found here.
For the purpose of this project report, we shall focus on the 'emir.tif' image which is shown below. As can be seen, the image comprises of three separate color channels (red, green and blue from bottom to top).
Before we get to the main portion of this project, we first split the 3 color channel images into separate pytorch tensors. This was done by dividing the image into 3 equal parts vertically. The result is the 3 images shown below.
One thing that may be apparent at this point is that the 3 color channel images do not have the same level of brightness. This is clear from Figure 2 where clothing in the red channel image looks dark whereas the clothing in the blu color channel image looks light. Accordingly, if we simply try to align the color channels as is (using a metric like the "Sum of Squared Differences, i.e. SSD, distance"), we would not expect to get the best alignment. In fact, Figure 3 below shows the result of such an operation. Clearly, we need to use a smarter feature than simply using pixel values.
To resolve the differing brightness issue, we shall first perform edge detection on the 3 color channels and then align these edges instead of the pixel values in the original images. We will start by applying a Gaussian blur on the 3 channel images. This was done to reduce the number of edges in the image, thus making it easier to highlight the more relevant edges (this idea was taken from here). The 3x3 Gaussian filter shown below (Figure 4) was convolved with the 3 channel images to obtain the blurred versions of these images.
A set of custom convolution functions were implemented to perform this operation. In these functions, the 3 color channel images were first stacked together and then vectorized to have the windows overlapping with the kernel in separate columns. This 3D tensor was then multiplied with the Gaussian kernel and the columns were summed up to get the values for the convolved image. Finally, the 3 color channels were separated and devectorized into 2D images. (Note: This method was used because the initial implementation of the convolution function using "for" loops was too slow.) The result of the blurring operation is shown below (Figure 5).
The effect of the blurring may not be clear in the Emir images since the image resolution is over 3000 pixels while the kernel is only 3 pixels wide. The image below (Figure 6) shows a more noticeable effect of this blurring option.
Now, we get to the edge detection task. We use the following 3x3 Sobel filters (Figure 7) for this (one for detecting horizontal edges and the other for verticle edges).
We use the same convolution functions mentioned in the Gaussian blurring section, however, here we perform 2 convolutions (in the horizontal and verticle directions) and then take the RMS value for each pixel in the convolved images. The resulting edge-detected images for the 3 color channels are shown below (Figure 8).
Finally, we can now align the edges in the 3 color channel images; here, we will align the red and green channel images to the blue channel image (using SSD). We could do an exhaustive search within a fixed window (for e.g. -15 to +15), but this would be an extremely slow operation, especially if the window is too large (as we will will see later, the best y-alignment for the red channel of the Emir image is around 107—this would require a very large window to search exhaustively). Accordingly, a pyramid search was performed to speed up the process. Here, we first scale the original channel images by a factor of 2, 4, 8 and so on (Figure 9), and start by aligning at the smallest resolution.
Once we are back up to the original resolution, we add up the smaller alignments (by multiplying them with the appropriate factor of 2) to obtain the best alignment for each of the 3 color channel images. With these, we shift the original red and green color channel images (not the edge detected ones) by the obtained displacements. The final result for the Emir image is given in Figure 10.
As can be seen in the previous section, most of the output images look pretty good; the exceptions are "lady.tif" and "self_portrait.tif." Upon further inspection, it was discovered that the problem lied in the edge detection step. Take a look at the edge-detected color channels for the "lady.tif" image (Figure 11).
The thing one may notice is that there are hardly any edges in the above images as compared to the ones in the "Edge Detection" section earlier (this may be because the image doesn't have many sharp edges). The same was noted in the edge-detected color channels of 'self_portrait.tif.' To resolve this issue, the gaussian blurring step was skipped (to prevent the code from reducing the number of edges in the image). The results are given below (Figure 12 & 13).
We see that the "lady.tif" image (Figure 12, right image) looks much better than earlier (though not perfect). However, the "self_portrait.tif" image hasn't changed much. Fixing this may require a more sophisticated edge-detection implementation. Instead, we completely bypass the edge-detection step and directly perform pyramid alignment on raw pixel values. The result now is much better than earlier (see Figure 14).
For the implementation of this project, pytorch tensors were used instead of numpy arrays/python lists. This didn't affect the alignment of the images (except for maybe a slight difference in runtime) and so, the use of pytorch tensors must be verified by looking at the source code submitted along with the project.
As discussed in the prior sections of this report, our implementation allows for the use of both raw pixel values (see "Differing Brightness Issue" section) and edges for the alignment task (see "Edge Detection" section). It has also been shown that using the edge features provided much better results, except in rare cases (see "Discussion on Results" section).