Image Warping and Mosaicing - Part A

Shoot images and recover homographies

I used these images throughout my project:

Ihouse Great Hall 1

Ihouse Great Hall 2

Ihouse Elevator 1

Ihouse Elevator 2

Ihouse Stairs 1

Ihouse Stairs 2

The first step of the project was to define correspondances between the images. To do this, I used the tool from project 3. I observed that for best results, it was preferrable to have a large number of point pairs and have a signiciant overlap between images.

Once I have defined the correspondances, the next task is to recover a transformation such that: \[ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} \] Note that this transformation won't map points from one image to another directly, but there will also be a scaling component w which we had to account for (it took me some time to realize this). From here, we expand and rearrange to obtain: \[ \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \]

Note that if we pick four points there will be an exact solution to this system. However, in practice, due innacuracy in point selection it is better to select more points and then use least squares.

Warp images

Once we have recovered the homography H, we can use this matrix to warp one of the image (which is necessary before mosaicing). To do this, I reused some of the techniques from project 3. Specifically, the steps to warp an image are the following:

First, we apply the homography to the corners of the image to establish the shape of the final warp.
Then, using sk.draw.polygon, we find the coordinates of the pixels in this resulting shape. We can perform an inverse warp of these coordinates to recover the corresponding positions in the source image.
Finally, we interpolate the pixel values at this positions and fill in the result.

For example, here is the first great hall image warped according to its homography with great hall 2:

Ihouse great hall 1 Original

Ihouse great hall 1 warped

Ihouse Great Hall 2

We can get interesting results by warping images that contain rectangular objects in perspective into images that force those objects to appear rectangular. To do this, we simply define a correspondace between the four corners and a rectangle of our choice:

Board Outside my Room

Warped Board

Book on chest

Warped Window

Note that it might seem that the shapes are still not rectangular. I believe that this is a visual effect due to the distorted perspective.

Mosaicing

Once we have recovered the homography matrix, we may blend once image into another by superposing the aligned homography with the original destination image. To do this, we start calculating the size of the final projection which is calculated from the sizes of the images and the offset between them. After building the 'canvas' where the result will go, we may superimpose the images. To avoid the appearance of edges between the images, we blend them using an alpha mask than decreases from 0 to 1 between images We obtain the following results:

Leftomst Image

Rightmost image

Mosaic

Here are the alpha masks for the images above.

Leftomst Image

Rightmost image

And here are some other mosaics I created:

Leftomst Image

Rightmost image

Mosaic

Leftmost image

Rightmost image

Mosaic

Note that there is no additional difficulty to doing 3 images instead of 2. We obtain the homography between 1 and 3 by multiplying those of 1 and 2 and 2 and 3. I made a naive (not alpha blended) 3 photo mosaic:

Leftmost image

Middle image

Rightmost image

Naive mosaic

Part B: Feature Matching for Autostitching

Harris

The goal of this section is to eliminate the need for manual correspondance selection. The first step towards achieving this is through corner detection, for which we use the provided harris starter code. Specifically, what the harris algorithm does is calculate a sum of the gradients around an image location and compute its eignevalues. If the matrix indicates sufficient convexity (above a threshold), we'll consider it a corner. In the provided implementation, the threshold is set to a very low value that identifies a lot of potential corners, but to speed up computation and obtain better results I decided to set the threshold a bit higher from the get-go. This is an example output from the harris corner detector:

Naive Harris results

Lower threshold for a point

Higher threshold for a point

Adaptive non-maximal supression

The problem with thresholding directly on the harris value is that we may end up with many points on one section of an image and fewer on a another, yielding suboptimal results. For this reason, we find points that are locally relevant (as opposed to globally significant). I did this by finding the distance from each point to another of greater activation. Then, I kept the k points for which this metric was higher. We should end up with an even spread of corners throughout the image. Here is a nice example:

Feature extraction and matching

From the paper, we take the idea to compare downscaled large samples of images around selected corner points rather than naively comparing small patches around the point. This makes the process more stable to slight angle changes between pictures, or similar small errors like that. Also, to avoid aliasing, we first blur the image before extracting features.

Once the features have been extracted, we may compare them to each other. In this case, we apply a simple L2 norm between features to evaluate the likelihood of those two features corresponding to the same image location. Then, I applied Lowe's trick to make outliers less likely (if the first match option is significantly better than the second, then it is likely correct).

Another feature I implemented is checking for matches in both directions, and applying Lowe's trick both ways. That is to say, if (p1, p2) are to be considered a pair, p1 must be significantly better match than any other for p2 and p2 must be so for p1. I found that, in practice, this almost eliminates outliers completely (I haven't seen any in the images I've tried), thereby eliminating the need for RANSAC. Still, I implemented ransac since its required (more on that later). Here is an example of successful matches between two images:

RANSAC

The final step of the process is to apply RANSAC for robust homography estimation. The goal of RANSAC is to make our homography algorithm robust to outliers which were misidentified in the steps above.

RANSAC works in the following way:

First, we choose four random points and calculate estimate the homography based off those.
Using this homography, determine how many points have been assigned to their correct location (within a margin of error). These points are called inliners.
Repeat the steps above with different samples of 4 points until the best homography - that with the most inliners in achieved.
Calculate the best homography by performing least squares on this largest possible set of inliners.

By doing this I achieved the following results:

Ihouse Elevator 1

Ihouse Elevator 2

Manually selected points mosaic