16-726 Project 1, Jason Xu

Overview

This project aims to automatically colorize images from the Prokudin-Gorskii collection by aligning the color channels, taken separately, and then combining them into a single image.

Approach

My implementation searches for the optimal alignment by using sum of squared differences (SSD) loss, calculated by summing the squared difference for each pixel, and does this using raw RGB values. To increase accuracy, I cropped 400 pixels from each image to remove the borders, and removed the overflowing pixels as a result of torch.roll. To speed up the implementation, I implemented an image pyramid search, which operates as follows:

Call the alignment function with a set amount of layers
If the number of layers is 0, return a shift of (0,0)
Otherwise:
- Downscale the image by 2 using bilinear interpolation
- Pass downscaled image into this function, and find the optimal alignment for the last layer (layer-1) by recursion
- Offset the image by 2 times the shift returned by the previous layer, and remove overflowing pixels from torch.roll
- Search in a box of [-1,1] for the best shift using original image
- Return the best shift added to the offset

Using this implementation with 8 layers, I am able to search a box of [-256, 256], more than enough for the images in this collection.

PyTorch support (extra credit)

To take advantage of GPU power, I wrote this assignment in PyTorch, and used mostly the same functions that I would use in numpy. The only difference is that I send the parsed image to the GPU, then back to the CPU after operations to display using CV libraries.

Results

The implementation worked well and fast on most images provided. The results are as shown:

cathedral (low resolution)
Green shift: [2 5] Red shift: [ 3 12]

emir
Green shift: [24 49] Red shift: [-203 97]

harvesters
Green shift: [16 59] Red shift: [ 13 123]

icon
Green shift: [17 41] Red shift: [23 89]

lady
Green shift: [ 8 51] Red shift: [ 12 112]

self_portrait
Green shift: [29 78] Red shift: [ 37 176]

three_generations
Green shift: [14 53] Red shift: [ 11 111]

train
Green shift: [ 6 42] Red shift: [32 87]

turkmen
Green shift: [21 56] Red shift: [ 28 116]

village
Green shift: [12 64] Red shift: [ 22 137]

The simple RGB-based implementation failed to align Emir well, since the intricate colors on his clothes make the RGB channels quite different, enough to make a misaligned image have less SSD loss than the correct solution. This could be mitigated by using other more advanced methods such as edge filtering, or by the photographer manually aligning it.

Own results

Three images from the collection that I managed to align are as follows:

They work well since they are landscapes and seem to have similar RGB channels.