Last Updated: February 13, 2022
Multi-Resolution Image Blending
This article dives deep into the world of seemlessly merging 2 images together in a more natural way than alpha matting. The techniques used are Laplacian and Gaussian blending. There would be no deep dive into the technical details of laplacian or gaussian matting as this article is focused on image blending.
Problem
To begin with, we have 2 images that we'd like to blend, shown below as
lets load the images using OpenCV as follows
A = cv.imread('Hand.png', cv.IMREAD_REDUCED_COLOR_4)
B = cv.imread('Veles-Mask-Template.png', cv.IMREAD_REDUCED_COLOR_4)
M = cv.imread('Mask.png', cv.IMREAD_REDUCED_GRAYSCALE_4)
The 'cv.IMREAD_REDUCED_COLOR_4' and 'cv.IMREAD_REDUCED_GRAYSCALE_4' options in OpenCV rescales the image by 1/4th its original size.
Image B is the source image that we would be transfering into Image A, the hand. We'd use the binary mask to crop out the source image.
A binary mask is not the same as an alpha mask. The main difference lies in the fact that a binary mask has only 1s/0s (0 or 255) in the image while an alpha mask has values ranging from 0 to 1 (i.e any value from 0 to 255)
Creating a Gaussian Pyramid
Inspired by actual pyramids, an image pyramid is simply a collection of images in decreasing order of sizes, with the largest image at the botton and the smallest at the top.
There is no consistent definition of an image pyramid, as some texts refer to gaussian pyramids, laplacian pyramids or simply just a pyramid of images downscaled with no transformations applied to each layer.
Constructing an Image pyramid
First, we'd construct an image pyramid, specifically, a gaussian image pyramid. OpenCV has a builtin function for constructing gaussian pyramids called cv2.pyrDown
We can utilise this in creating a function that returns a pyramid when given an input image and the number of levels in the pyramid (scale).
import cv2
def cv_pyramid(A, scale) -> list:
gp = [A]
for i in range(1, scale):
A = cv2.pyrDown(A)
gp.append(A)
return gp
With this function, we can construct gaussian image pyramids for the 3 images previously imported by running
gpA = cv_pyramid(A.copy(), scale=5)
gpB = cv_pyramid(B.copy(), scale=5)
gpM = cv_pyramid(M.copy(), scale=5)
An illustration of the hand image is shown below
Creating a Laplacian pyramid
In terms of frequency, a laplacian pyramid can be seen as a high frequency, multi scale representation of an image while the gaussian pyramid can be seen as a low frequency representation. What does this mean? Think of a laplacian pyramid as a compression step that captures the "important" information in an image, kind of like an edge detector.
It can also be seen to consist of difference images as we construct it by finding the difference between 2 consecutive images in the gaussian pyramid.
Construcing a laplacian pyramid using OpenCV
We'd use the OpenCV functions pyrUp and subtract to create a laplacian pyramid. A function to perform this is shown below.
def cv2_same_size(a,b):
maxH = max(a.shape[0], b.shape[0])
maxW = max(a.shape[1], b.shape[1])
a = cv2.resize(a, (maxW, maxH))
b = cv2.resize(b, (maxW, maxH))
return a,b
def cv_laplacian(gp, scale) -> list:
lp = [gp[-1].copy()]
for i in reversed(range(scale-1)):
gExp = cv2.pyrUp(gp[i+1].copy())
gpi = gp[i].copy()
gpi, gExp = cv2_same_size(gpi, gExp)
li = cv2.subtract(gpi, gExp)
lp.insert(0, li)
return lp
Sometimes the image sizes differ by a single pixel due to rounding errors, as the
pyrUp
operation upscales the image by a factor of 2. Thecv2_same_size
helper function simply ensures both images are of the same size.
The input to our cv_laplacian function is the gaussian pyramid created from the previous step and a scale (which can also be inferred from the number of levels in the supplied gaussian pyramid).
Similar to the gaussian pyramid, the first level of the laplacian pyramid is kept as it is. While for each level, the following operations are performed to it:
- An imaginary next image is created called gExp by using the pyrUp function. We create the 'difference' for the current level by 'borrowing' the image from the next level.
- The laplacian for the current level is simply the difference between this level and the next i.e
Lets visualise what the laplacian of the hand image looks like by running the following and viewing the image.
lpA = cv_laplacian(gpA, scale=5)
lpB = cv_laplacian(gpB, scale=5)
lpM = cv_laplacian(gpM, scale=5)
The laplacian images are mostly black, therefore, we'd apply a little visualisation trick to brighten the high frequency aspects of the image.
# Brighten the laplacian pyramid for visualisation purposes
apy = [cv2pil((x + 100).astype('uint8')) for x in lpA[:-1]]
apy.append(cv2pil(lpA[-1]))
for idx,x in enumerate(apy[:-1]):
x.save(f'hand_laplacian_level_{idx}.png')
apy[-1].save('hand_laplacian_level_4.png')
The results are shown as
Reconstructing the original image from the Laplacian pyramid
The original image can be reconstructed from the laplacian pyramid using the following function
def cv_reconstruct_laplacian(pyramid):
scale = len(pyramid)
up = pyramid[-1] # start with the tip, this is would the smallest scale image
for i in range(scale-1, 0, -1):
next = pyramid[i-1].copy()
up = cv2.pyrUp(up)
up, next = cv2_same_size(up, next) # sometimes the width/height can be off by a few pixels due to `cv2.pyrUp`
up = cv2.add(next, up)
return up
The Multi-Resolution blending algorithm
def multiply_nn_mnn(g, rgb):
# multiply an rgb image by a single channel image
rgb[:,:,0] = rgb[:,:,0] * g
rgb[:,:,1] = rgb[:,:,1] * g
rgb[:,:,2] = rgb[:,:,2] * g
return rgb
def cv_multiresolution_blend(gm, la, lb) -> list:
gm = [x // 255 for x in gm]
blended = []
for i in range(len(gm)):
gmi , lbi = cv2_same_size(gm[i], lb[i])
bi = multiply_nn_mnn(gmi, lbi) + multiply_nn_mnn((1-gmi), la[i])
bi = bi.astype(np.uint8)
blended.append(bi)
return blended
With the blending function, we can create the blended image as follows
blended_pyramid = cv_multiresolution_blend(gpM, lpA, lpB)
blended_image = cv_reconstruct_laplacian(blended_pyramid)
and the result is shown as