proj1_wentaiz

16-726 Learning-Based Image Synthesis, 2021 Spring

Project 1: Colorizing the Prokudin-Gorskii Photo Collection

Teddy Zhang (wentaiz)

Overview

In digitized Prokudin-Gorskii glass plate images, each image is provided as three gray scale images representing the R, G and B channel infomation. The three images are not perfectly aligned since they are not taken at the same time. The goal in this project is to design an algorithm that automatically restore a color image with as few visual artifacts as possible.

Single-scale Implementation

When the input image is small ( $max(h,w)<500$ ), we can directly align the r,g channels with the b channel image by searching within a user-defined offset range. The offsets that provide the least error according to the metric is proposed to generate the final color image. The details of my implementation are:

Default search range = [-15, 15] for both h and w direction.
Loss function: Sum of Squared Differences (SSD)
$L = \sum_i\sum_j(p_{ij}-q_{ij})^2$ , $p$ and $q$ are pixels in the two images.
Region: To avoid the noisy information from the boundaries, only 0.6X center region of the image is considered to calculate the loss.

Here are the test results of this algorithm:

r channel

g channel

b channel

g:[5,2], r: [12,3]

And more results from some other downloaded images from the same collection:

g:[3,1], r: [8,1]

g:[2,1], r: [9,1]

g:[4,2], r: [7,4]

g:[4,1], r: [9,2]

We can tell that this method works fine on small images.

Pyramid Implementation for Large Images

When the input image is very large, the previous methods will be extremely time-consuming due to the large search range. Instead, we can hierachically downsample the image by a factor of 2 each time so that the image pyramid is generated. We can start to align the images from the coarsest scale and use the obtained offsets as an initial guess for the next scale alignment. Within each scale, the single-scale method mentioned above is utilized to find the best offsets. The details of my implementations are:

No. of scales = $int(log_2($ image_height $/400))$ . For most of the given images, this equals to 3.
Default search range: For the first layer: [-20,20] for both axis. For the other layers: [ $2\times$ init_value $-3$ , $2\times$ init_value $+3$ ] for both axis. Here, init_value is obtained from the best offset in the previous scale.
Loss function: SSD. Same as the previous method.

Here are the test results of this algorithm:

1. g:(55, 8), r:(114, 12)

2. g:(49, 24), r:(61, 43)

3. g:(59, 17), r:(123, 14)

4. g:(40, 17), r:(89, 23)

5. g:(78, 29) r:(126, 38)

6. g:(52, 14) r:(111, 12)

7. g:(43, 6), r:(87, 32)

8. g:(56, 21), r:(116, 28)

9. g:(65, 12), r:(126, 21)

We can tell from the results above that this method works fine for most of the images. But for image 2 and 5, there are evident inalignments. The major cause is that the pixel values within these two images are quite similar and SSD on the raw pixel values may not give the least value when it is perfectly aligned. Some better features that stays invariant as the tone of the image changes should be explored. Please refer to Bells & Whistles for better strategies.

And more results from some other downloaded images from the same collection:

1. g:(41, 15), r:(91, 25)

2. g:(26, 3), r:(120, 4)

3. g:(52, 4), r:(104, -47)

Bells & Whistles: Better Features

When the input image is very large, the previous failed to align the images perfectly. In order to use a more reliable feature than the raw pixel values, I want to extract the gradient infomation from each channel and try to align the gradient images. The details are:

Filter: the Sobel filter. The filter is applied in both x,y directions and the norm of the gradient at each pixel is served as the gradient information.
Alignment: the Sobel filter is applied to all r, g and b channels and the alignment is done with the pyramid method demonstrated above on the gradient images.

Here are some test results and the comparison with the previous method:

Gradients with Sobel filter

before, g:(49, 24), r:(61, 43)

after, g:(49, 24), r:(107, 40)

Gradients with Sobel filter

before, g:(78, 29) r:(126, 38)

after, g:(78, 29), r:(166, 35)

before, g:(55, 8), r:(114, 12)

after, g:(56, 9), r:(120, 13)

zoom in from above

Bells & Whistles: Automatic Cropping

For the purpose of automatic cropping, we want to identify the color/black boundaries and then remove them accordingly. I used Sobel filters again to extract the gradient information. Then I search for the evident boundaries within each direction separately. The details are:

Region of searching: $10\%$ from both ends of the image for each axis.
Characteristic: the mean values( $m_i$ ) along the axis perpanticular to the current focusing axis.
Threshold: $T=\tilde{m}+2\times \tilde{\sigma}$ , where $\tilde{m}$ is the mean value of all the mean values ( $m$ ) above, $\tilde{\sigma}$ is the standard deviation of $m$ .
Searching: The direction is from the center of the image to the boundary. The location of first value higher than the threshold is marked as the candidate cropping position. The search is done for all 3 channels and the least overlapping region is proposed as the final cropping resulting image.

Here are some test results and the comparison without and with the automatic clipping:

before

after

before

after

before

after

before

after

Bells & Whistles: Automatic Contrasting

In order to obtain a proper contrasting, we should first find the range of brightness in the original image and then try to map the pixels to a new range that fills the entire domain. The details are:

Brightness: The brightness information is obtain by converting the original RGB image to LAB image. The L channel contains the brightness information for each pixel within the range of [0,100].
Rescaling: The 1-percentile and 99-percentile brightness values are found among all the pixels. Then a linear mapping ( $L'=aL+b$ ) is applied to all the brightness values. Here, $a,b$ can be obtained by solving:
$aL_{1\%}+b=0, aL_{99\%}+b=100$
The resulting brightness values are clipped by 0 and 100 in order to achieve a valid range.

Here are some test results and the comparison without and with the automatic contrasting:

before

after

before

after

before

after

before

after

Runtime

Operations	On Small Images(s)	On Large Images(s)
colorization(raw pixel)	$1.26\pm0.16$	$17.27\pm1.60$
colorization(gradient)	$1.34\pm0.14$	$18.90\pm0.97$
colorization(gradient)+Cropping	$1.59\pm0.14$	$19.77\pm0.87$
colorization(gradient)+Cropping+Contrast	$1.48\pm0.20$	$26.57\pm1.05$

Acknowledgement

The basic methods are inspired by CMU 16-726 and Berkeley CS194-26.
Some more advanced methods are inspired by Andy Zeng's work.