This project aims to automatically colorize images from the Prokudin-Gorskii collection by aligning the color channels, taken separately, and then combining them into a single image.
My implementation searches for the optimal alignment by using sum of squared differences (SSD) loss, calculated by summing the squared difference for each pixel, and does this using raw RGB values. To increase accuracy, I cropped 400 pixels from each image to remove the borders, and removed the overflowing pixels as a result of torch.roll. To speed up the implementation, I implemented an image pyramid search, which operates as follows:
Using this implementation with 8 layers, I am able to search a box of [-256, 256], more than enough for the images in this collection.
To take advantage of GPU power, I wrote this assignment in PyTorch, and used mostly the same functions that I would use in numpy. The only difference is that I send the parsed image to the GPU, then back to the CPU after operations to display using CV libraries.
The implementation worked well and fast on most images provided. The results are as shown:
cathedral (low resolution)
Green shift: [2 5] Red shift: [ 3 12]
emir
Green shift: [24 49] Red shift: [-203 97]
harvesters
Green shift: [16 59] Red shift: [ 13 123]
icon
Green shift: [17 41] Red shift: [23 89]
lady
Green shift: [ 8 51] Red shift: [ 12 112]
self_portrait
Green shift: [29 78] Red shift: [ 37 176]
three_generations
Green shift: [14 53] Red shift: [ 11 111]
train
Green shift: [ 6 42] Red shift: [32 87]
turkmen
Green shift: [21 56] Red shift: [ 28 116]
village
Green shift: [12 64] Red shift: [ 22 137]
The simple RGB-based implementation failed to align Emir well, since the intricate colors on his clothes make the RGB channels quite different, enough to make a misaligned image have less SSD loss than the correct solution. This could be mitigated by using other more advanced methods such as edge filtering, or by the photographer manually aligning it.
Three images from the collection that I managed to align are as follows:
They work well since they are landscapes and seem to have similar RGB channels.