16-726 Assignment 1

Jason Zhang (jasonyzhang@cmu.edu)

Alignment

I began by exhaustively searching over possible displacements and scoring each possible displacement using normalized cross-correlation (NCC).

For larger images, this quickly became computationally infeasible, so I implemented an image pyramid to check over a smaller window at a variety of scales. This significantly improved runtime.

For better matching, I tried the following edge detection methods:

Finite Differencing filter: convolve the image with the simplest possible kernel for finding edges in x direction and y direction. Then, compute the magnitudes of the differences for pixel location for each channel.

Sobel filter: same as above but with a slightly better filter.

begin{bmatrix} 1 & 0 & -1 2 & 0 & -2 1 & 0 & -1 end{bmatrix}

Canny Edge Detector: Basically smooths the image with a Gaussian filter and applies a difference filter. To reduce noise, the algorithm uses two different thresholds: keeping pixels that are above the higher threshold and keeping pixels that are above the lower threshold only if they are also connected to pixels above the higher thresholds. I didn't implement this myself, just used cv2.Canny.

The Sobel filter and Canny Edge Detector both performed reasonably well.

Below is a visualization of the different levels of the image pyramid. From left to right: resized image, Canny edge detector applied to R, G, and B channels respectively. The edges in the R and G channels were then compared to the edges with the B channel using NCC.

Cropping

To crop the image to remove the borders, I first applied a Sobel filter to the gray-scale image to get horizontal and vertical edge detectors.

The borders were picked up relatively strongly by the edge detectors. I averaged the vertical edge detector along the horizontal dimension and the horizontal edge detector along the vertical dimension. The indices with large means thus correspond to the borders of the picture.

Here are some visualizations. The blue lines on the margins are the mean values from the edge detectors (after smoothing and normalizing). The red lines are the thresholds to crop.

Failure Modes

This method assumes the borders are axis-aligned and produce a sharp gradient.

For the most part, the borders were axis-aligned but the gray-scale versions of the images did not always have sharp edge. For example, the cathedral's yellow bar has a very similar luminance to the blue sky.

Perhaps this method would have worked better if applied on each channel individually rather than on the grayscale image.

Recoloring

I began by playing around with the temperature of the color palette by simply rescaling the channels. I think there was some improvement since indoor pictures should have warmer light and the outdoor picture should have cooler lighting. Overall, the effect with negligible and in many cases actually looked worse.

Then, I tried a bunch of techniques with varying success:

White World (2nd column): I rescaled the color channels such that the brightest pixel is always (255, 255, 255). Not really noticeable difference.
Gray World (3rd column): I rescaled the R and B channels such that the average intensity was the same across all three channels. This looked bad.
Histogram equalization (4th column): I converted images to LAB space then used the cdf to spread the intensities of the image. Then I converted the image back to color. This significantly increased the contrast in the image, making colors look more saturated. However, images with both dark regions and light regions look very unrealistic since the global contrast is too high.
CLAHE (5th column): To reduce the effects of using global contrast, adaptive histogram equalization splits the image into tiles and applies equalization to smaller areas of the image. This produces a result with more realistic looking lighting throughout. I just used cv2.createCLAHE for this part.

Offset

image	disp_r	disp_g
cathedral	(12, -3)	(-5, -2)
emir	(-107, -40)	(-49, -24)
harvesters	(-124, -14)	(-60, -18)
icon	(-90, -23)	(-39, -16)
lady	(-120, -13)	(-56, -10)
monastery	(-3, -2)	(3, -2)
nativity	(-8, 0)	(-3, -1)
self_portrait	(-175, -37)	(-77, -29)
settlers	(-14, 1)	(-7, 0)
three_generations	(-111, -8)	(-58, -17)
train	(-85, -29)	(-43, -5)
turkmen	(-118, -29)	(-57, -22)
village	(-137, -21)	(-65, -11)
chalice	(-4, -2)	(-1, -1)
poles	(-6, -3)	(-2, -2)