11-747 Learning-based Image Synthesis Manuel Rodriguez Ladron de Guevara
The goal of this assignment is to take the digitized Prokudin-Gorskii
glass plate images and produce a color image using image processing
techniques, aiming to have as few visual artifacts as possible.
The input of the process is a digitized glass plate image that consists of 3 separate channels ordered from top to bottom as BGR as shown in Figure 1. Our job is to separate the 3 channels and align them as better as possible. The question is how to align them efficiently?. The straight-away inefficient approach is to search over all possible displacements until we find best alignment. This is prohibitively expensive for large images—images with one of the dimensions larger than 3000px.
Each alignment gets a score using some image matching metrics. For this assingment, I explored Sum of Squared Differences (SSD), Normalized Cross Correlation (NCC), and Peak Signal-to-Noise Ration (PSNR) functions. The objective is to maximize or minimize one of these functions by searching which array configuration finds the highest or lowest value.
Sum of Squared Differences SSD is calculated based on the following equation: $$ SSD = \sum_{x,y} (I_{x,y} - H_{x,y})^2 $$ where $I$ and $H$ are the two arrays and $x,y$ are pixels corresponding to the height and width dimensions respectively.
Normalized Cross Correlation NNC is calculated based on the following equation: $$ NCC = \frac{\sum_{x,y} (I_{x,y} - \mu I) (H_{x,y} - \mu H) }{\sqrt {\sum_{x,y} (I_{x,y} - \mu I)^2} \sqrt{\sum_{x,y} (H_{x,y} - \mu H)^2}} $$ where $\mu I$ and $\mu H$ are the luminance of each image.
def psnr(original, compressed): mse = np.mean((original - compressed) ** 2) if(mse == 0): # MSE is zero means no noise is present in the signal . # Therefore PSNR have no importance. return 100 max_pixel = 1 psnr = 20 * np.log10(max_pixel / np.sqrt(mse)) return psnrPeak Signal-to-Noise Ration PSNR is calculated based on the following equation: $$ PSNR = 20 * \log_{10} { \frac{p_m}{\sqrt{\sigma{IH}}}} $$ where $\sigma{IH}$ is the mean of squared error function between the two images and $p_m$ is the maximum pixel intensity.
To make an efficient algorithm, the assignment already suggests to use an image pyramid—downsampling the image n times by a factor of 2—and do the search over the new set of lower resolution images. Having the image pyramid algorithm as a backbone for this project, I propose two different approaches, besides exhaustive searching over the whole pixel space.
Exhaustive Search The exhaustive search is the easy way out solution for small images, since the time complexity of the search spaces is $O(P)^2$, and consists of iterating over the rows and cols of the image array and computing a metric that compares such two array configurations.
Image Pyramid Search This is an efficient search algorithm that generally yields good results in less than 60 seconds for high resolution images—up to 9000 pixels. As mentioned above, this algorithm scales down the image by a factor of 2 n times, being n an automated parameter based on the input image resolution. Given an initial window search, the algorithm searches from coarse to fine images, within the reduced window size, which decreases by a factor of 0.8 as the images become larger. Window size is a hyperparameter.
Evolutionary Search I propose a novel approach to search the best alignment, inspired by the Darwinian theory. This black-box optimization algorithm consists on treating the search space as a multivariate Gaussian, with varying mean and covariance matrix. The algorithm starts with a population $P$, initial $\mu=0$, and initial covariance $100I$. We select elites by keeping a fraction of the population of $10\%$ at each iteration, and noise $\epsilon$ is added to covariance at each step: $0.25I$. We optimize the different metrics mentioned above. We let the algorithm run for 10 periods each time and set an initial Gaussian variance of $\sigma = 10$, which decreases by a factor of $0.9$ at each pyramid level. We optimize using the following equations: $$ \mu_{t+1} \leftarrow \frac{1}{el_s}\sum_{i-1}^{el_s} \theta_t^{(i)}, \;\;\; \sum{t} \leftarrow Cov (\theta_t^{(1)}, ..., \theta_t^{(el_s)}) + \epsilon I, $$ where $el_s$ is the elite size and $\theta_t^{(i)}$ denotes the i-th best parameters from the previous iteration and $\epsilon \in \mathcal{R}$ is a small constant.
We introduce various improvements over our baselines.
Automated Edge Cropping Natural to the glass plates is to have artifacts along the borders. While for the baselines we do not touch the edges of the images, here we detect edges automatically by applying an intensity threshold at rows and columns near the image limits. To alleviate a potential til of the edges, instead of looking at values of single rows or cols, we look at multiple and compute a threshold within the various rows or cols.
Pytorch Implementation We re-implement the algorithm in Pytorch, this time making use of functions such as MSE loss, and efficient built-in operations such as convolutions.
Automatic Contrasting We implement automatic contrast to improve the perception of the image quality.
We show here the results obtained by different algorithms and metrics.
The following images are aligned using the pyramid algorithm baseline. Unless otherwise stated, there is no prior automatic cropping—we show how, in some cases, prior cropping helps alignment. Likewise, and unless otherwise stated, the alignment is calculated using SSD metric.
The following images are aligned using the evolutionary approach described above integrated in our pyramid algorithm. This process treats a space of actions as a gaussian distribution, and the portion of the population tha best fits the problem leave their offsprings through the mean and variance of the next gen vector.
Due to time constraints, we show the not-so-good results taken by this approach, although I'll work on this to get it done, since the part left is a grid-search of of hyperparameters.
Results on automatic contrasting
Results on automatic cropping.