16-726 Project 1, Jason Xu

Overview

This project aims to automatically colorize images from the Prokudin-Gorskii collection by aligning the color channels, taken separately, and then combining them into a single image. 

Approach

My implementation searches for the optimal alignment by using sum of squared differences (SSD) loss, calculated by summing the squared difference for each pixel, and does this using raw RGB values. To increase accuracy, I cropped 400 pixels from each image to remove the borders, and removed the overflowing pixels as a result of torch.roll. To speed up the implementation, I implemented an image pyramid search, which operates as follows:

Using this implementation with 8 layers, I am able to search a box of [-256, 256], more than enough for the images in this collection. 

PyTorch support (extra credit)

To take advantage of GPU power, I wrote this assignment in PyTorch, and used mostly the same functions that I would use in numpy. The only difference is that I send the parsed image to the GPU, then back to the CPU after operations to display using CV libraries.

Results

The implementation worked well and fast on most images provided. The results are as shown:

cathedral (low resolution)
Green shift: [2 5] Red shift: [ 3 12]

cathedral

emir
Green shift: [24 49] Red shift: [-203 97]

emir


harvesters
Green shift: [16 59] Red shift: [ 13 123]

harvesters


icon
Green shift: [17 41] Red shift: [23 89]

icon


lady
Green shift: [ 8 51] Red shift: [ 12 112]

lady


self_portrait
Green shift: [29 78] Red shift: [ 37 176]

self


three_generations
Green shift: [14 53] Red shift: [ 11 111]

gen


train
Green shift: [ 6 42] Red shift: [32 87]

train


turkmen
Green shift: [21 56] Red shift: [ 28 116]

turkm


village
Green shift: [12 64] Red shift: [ 22 137]

village

The simple RGB-based implementation failed to align Emir well, since the intricate colors on his clothes make the RGB channels quite different, enough to make a misaligned image have less SSD loss than the correct solution. This could be mitigated by using other more advanced methods such as edge filtering, or by the photographer manually aligning it.

Own results

Three images from the collection that I managed to align are as follows:

876 893 1016

They work well since they are landscapes and seem to have similar RGB channels.