Master of Science in Machine Learning, 2023
This project is an attempt to fuse a source foreground image onto a target background image by ensuring the continuity of pixel gradients around a 4-neighborhood.
Often, when we try to fuse two images together, we end up with images that seem unnatural, due to sharp constrasts in color, contrast and brightness around the boundaries between both images.
Therefore, to make the resultant image feel more natural, we aim to incorporate the "style" of one image into another by only retaining the relative structure of the other image.
Intuitively, this is achieved by manipulating pixel values in the target image while maintaining a similarity of gradients.
As long as the colors between the target background and source foreground are not too different, and that the orientations of each image are coherent, gradient domain fusion can produce an image that looks natural.
Before performing gradient domain fusion, a toy image is first used as an example to illustrate how to calculate gradients within a single image, and to use them to reconstruct the given image, given a single starting point.
To do so, all constraints are combined to form a linear system of equations, which can then be solved to recover the original image.
In order to blend two images effectively, first, a rectangular region of interest that completely covers the source image is selected.
For each pixel in this region of interest, if the source image is not present, the constraint is simply set to v = b, where b is the value of the pixel in the target image.
If the source image is present, four constraints corresponding to the four directions are added into the linear system.
b is first set as the source gradient which we would like to mimic, and the element corresponding to the pixel location is set to 1 for A.
If the neighboring pixel is not in the source image, then we simply add its value to b. If it is, then we set -1 to the corresponding neighbor pixel location in A.
In doing so, a system of linear equations is created that is coherent with the minimization required for gradient domain fusion.
For each example, there are 4 options:
Click on the appropriate buttons to view each image.
This combination works pretty well, most likely due to the original Helicopter image also being in the sky.
This works relatively well, since the bear blends in pretty well with the water in the swimming pool.
However, the waves around the bear might not be coherent with the relatively still water, which results in a small peculiarity.
This is an example of a negative result, since the color of the chicken is completey different from its original color.
This is most likely due to the difference in background color, and the colors in the foreground image.
While this seems to work relatively well, the appearance of the car seems a little too shiny, which again could perhaps be attributed to the dark patches surrounding the car in the original image.
This works very nicely, other than the patch near the base of the tree.
There is some coloring issues as well, since the mask crop was not done very precisely, leading to the source image capturing some unwanted detail.
Rather than trying to replicate the source image gradients, we could try to replicate the pointwise maximum of the source and target image gradients.
This allows us to mimic the gradients in the source image, while also including information in the target background image that might be semantically significant.
Overall, it is expected that the mixed gradients method will produce images which seem to be more "transparent", since some of the information in the background is retained.
The normal Poisson blending here doesn't work very well, as it looks like the words were just copied onto the background image.
With mixed blending, we are able to retain some structure from the background, leading to a better result that looks more natural, as if the words were indeed written on the paper.
In order to improve contrast information in the converted grayscale images, we can use the HSV channels of the image as a gradient augmentation layer and apply mixed gradients.
Since the S (saturation) and V (value) channels provide information about the strength and the brightness of a pixel, these two channels are good candidates for gradient augmentation.
From these two results, we can see that the saturation channel is a better candidate for contrast augmentation.
We can also see that due to the need for pixels to scale towards more extreme values for a higher contrast, the overall image itself is perceived to be darker.