Created: December 18, 2021
Last Updated: February 13, 2022

Multi-Resolution Image Blending

This article dives deep into the world of seemlessly merging 2 images together in a more natural way than alpha matting. The techniques used are Laplacian and Gaussian blending. There would be no deep dive into the technical details of laplacian or gaussian matting as this article is focused on image blending.

Problem

To begin with, we have 2 images that we'd like to blend, shown below as

Image with ground truth annotation — Image A: Showing a hand

Image with predicted bounding box — Image B: Show our template, a cute monster

lets load the images using OpenCV as follows

A = cv.imread('Hand.png', cv.IMREAD_REDUCED_COLOR_4)
B = cv.imread('Veles-Mask-Template.png', cv.IMREAD_REDUCED_COLOR_4)
M = cv.imread('Mask.png', cv.IMREAD_REDUCED_GRAYSCALE_4)

The 'cv.IMREAD_REDUCED_COLOR_4' and 'cv.IMREAD_REDUCED_GRAYSCALE_4' options in OpenCV rescales the image by 1/4th its original size.

Image B is the source image that we would be transfering into Image A, the hand. We'd use the binary mask to crop out the source image.

A binary mask is not the same as an alpha mask. The main difference lies in the fact that a binary mask has only 1s/0s (0 or 255) in the image while an alpha mask has values ranging from 0 to 1 (i.e any value from 0 to 255)

Creating a Gaussian Pyramid

Great Pyramid of Giza Inspired by actual pyramids, an image pyramid is simply a collection of images in decreasing order of sizes, with the largest image at the botton and the smallest at the top.

There is no consistent definition of an image pyramid, as some texts refer to gaussian pyramids, laplacian pyramids or simply just a pyramid of images downscaled with no transformations applied to each layer.

Constructing an Image pyramid

First, we'd construct an image pyramid, specifically, a gaussian image pyramid. OpenCV has a builtin function for constructing gaussian pyramids called cv2.pyrDown We can utilise this in creating a function that returns a pyramid when given an input image and the number of levels in the pyramid (scale).

import cv2

def cv_pyramid(A, scale) -> list:
    gp = [A]
    for i in range(1, scale):
        A = cv2.pyrDown(A)
        gp.append(A)
    return gp

With this function, we can construct gaussian image pyramids for the 3 images previously imported by running

gpA = cv_pyramid(A.copy(), scale=5)
gpB = cv_pyramid(B.copy(), scale=5)
gpM = cv_pyramid(M.copy(), scale=5)

An illustration of the hand image is shown below

Gaussian Image Pyramid illustration

Creating a Laplacian pyramid

In terms of frequency, a laplacian pyramid can be seen as a high frequency, multi scale representation of an image while the gaussian pyramid can be seen as a low frequency representation. What does this mean? Think of a laplacian pyramid as a compression step that captures the "important" information in an image, kind of like an edge detector.

It can also be seen to consist of difference images as we construct it by finding the difference between 2 consecutive images in the gaussian pyramid.

Construcing a laplacian pyramid using OpenCV

We'd use the OpenCV functions pyrUp and subtract to create a laplacian pyramid. A function to perform this is shown below.

def cv2_same_size(a,b):
    maxH = max(a.shape[0], b.shape[0])
    maxW = max(a.shape[1], b.shape[1])
    a = cv2.resize(a, (maxW, maxH))
    b = cv2.resize(b, (maxW, maxH))
    return a,b

def cv_laplacian(gp, scale) -> list:
    lp = [gp[-1].copy()]
    for i in reversed(range(scale-1)):
        gExp = cv2.pyrUp(gp[i+1].copy())
        gpi = gp[i].copy()
        gpi, gExp = cv2_same_size(gpi, gExp)
        li = cv2.subtract(gpi, gExp)
        lp.insert(0, li)
    return lp

Sometimes the image sizes differ by a single pixel due to rounding errors, as the pyrUp operation upscales the image by a factor of 2. The cv2_same_size helper function simply ensures both images are of the same size.

The input to our cv_laplacian function is the gaussian pyramid created from the previous step and a scale (which can also be inferred from the number of levels in the supplied gaussian pyramid).

Similar to the gaussian pyramid, the first level of the laplacian pyramid is kept as it is. While for each level, the following operations are performed to it:

An imaginary next image is created called gExp by using the pyrUp function. We create the 'difference' for the current level by 'borrowing' the image from the next level.
The laplacian for the current level is simply the difference between this level and the next i.e $gpi - gExp$

Lets visualise what the laplacian of the hand image looks like by running the following and viewing the image.

lpA = cv_laplacian(gpA, scale=5)
lpB = cv_laplacian(gpB, scale=5)
lpM = cv_laplacian(gpM, scale=5)

The laplacian images are mostly black, therefore, we'd apply a little visualisation trick to brighten the high frequency aspects of the image.

# Brighten the laplacian pyramid for visualisation purposes
apy = [cv2pil((x + 100).astype('uint8')) for x in lpA[:-1]]
apy.append(cv2pil(lpA[-1]))
for idx,x in enumerate(apy[:-1]):
    x.save(f'hand_laplacian_level_{idx}.png')
apy[-1].save('hand_laplacian_level_4.png')

The results are shown as

Laplacian pyramid image of the hand

Reconstructing the original image from the Laplacian pyramid

The original image can be reconstructed from the laplacian pyramid using the following function

def cv_reconstruct_laplacian(pyramid):
    scale = len(pyramid)
    up = pyramid[-1] # start with the tip, this is would the smallest scale image
    for i in range(scale-1, 0, -1):
        next = pyramid[i-1].copy()
        up = cv2.pyrUp(up)
        up, next = cv2_same_size(up, next) # sometimes the width/height can be off by a few pixels due to `cv2.pyrUp`
        up = cv2.add(next, up)

    return up

The Multi-Resolution blending algorithm

def multiply_nn_mnn(g, rgb):
    # multiply an rgb image by a single channel image
    rgb[:,:,0] = rgb[:,:,0] * g
    rgb[:,:,1] = rgb[:,:,1] * g
    rgb[:,:,2] = rgb[:,:,2] * g

    return rgb

def cv_multiresolution_blend(gm, la, lb) -> list:
    gm = [x // 255 for x in gm]
    blended = []
    for i in range(len(gm)):
        gmi , lbi = cv2_same_size(gm[i], lb[i])
        bi = multiply_nn_mnn(gmi, lbi) + multiply_nn_mnn((1-gmi), la[i])
        bi = bi.astype(np.uint8)
        blended.append(bi)
    return blended

With the blending function, we can create the blended image as follows

blended_pyramid = cv_multiresolution_blend(gpM, lpA, lpB)
blended_image = cv_reconstruct_laplacian(blended_pyramid)

and the result is shown as blended image