Image Warping and Mosaicing - Part A

Shoot images and recover homographies

I used these images throughout my project:

Original cameraman image
Ihouse Great Hall 1
Cameraman D_x
Ihouse Great Hall 2
Original cameraman image
Ihouse Elevator 1
Cameraman D_x
Ihouse Elevator 2
Original cameraman image
Ihouse Stairs 1
Cameraman D_x
Ihouse Stairs 2

The first step of the project was to define correspondances between the images. To do this, I used the tool from project 3. I observed that for best results, it was preferrable to have a large number of point pairs and have a signiciant overlap between images.

Once I have defined the correspondances, the next task is to recover a transformation such that: \[ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} \] Note that this transformation won't map points from one image to another directly, but there will also be a scaling component w which we had to account for (it took me some time to realize this). From here, we expand and rearrange to obtain: \[ \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \]

Note that if we pick four points there will be an exact solution to this system. However, in practice, due innacuracy in point selection it is better to select more points and then use least squares.

Warp images

Once we have recovered the homography H, we can use this matrix to warp one of the image (which is necessary before mosaicing). To do this, I reused some of the techniques from project 3. Specifically, the steps to warp an image are the following:

For example, here is the first great hall image warped according to its homography with great hall 2:

Original cameraman image
Ihouse great hall 1 Original
Cameraman D_x
Ihouse great hall 1 warped
Cameraman D_x
Ihouse Great Hall 2

We can get interesting results by warping images that contain rectangular objects in perspective into images that force those objects to appear rectangular. To do this, we simply define a correspondace between the four corners and a rectangle of our choice:

Original cameraman image
Board Outside my Room
Cameraman D_x
Warped Board
Original cameraman image
Book on chest
Cameraman D_x
Warped Window

Note that it might seem that the shapes are still not rectangular. I believe that this is a visual effect due to the distorted perspective.

Mosaicing

Once we have recovered the homography matrix, we may blend once image into another by superposing the aligned homography with the original destination image. To do this, we start calculating the size of the final projection which is calculated from the sizes of the images and the offset between them. After building the 'canvas' where the result will go, we may superimpose the images. To avoid the appearance of edges between the images, we blend them using an alpha mask than decreases from 0 to 1 between images We obtain the following results:

Original cameraman image
Leftomst Image
Cameraman D_x
Rightmost image
Cameraman D_x
Mosaic
Here are the alpha masks for the images above.
Original cameraman image
Leftomst Image
Cameraman D_x
Rightmost image

And here are some other mosaics I created:

Original cameraman image
Leftomst Image
Cameraman D_x
Rightmost image
Cameraman D_x
Mosaic
Original cameraman image
Leftmost image
Cameraman D_x
Rightmost image
Cameraman D_x
Mosaic

Note that there is no additional difficulty to doing 3 images instead of 2. We obtain the homography between 1 and 3 by multiplying those of 1 and 2 and 2 and 3. I made a naive (not alpha blended) 3 photo mosaic:

Original cameraman image
Leftmost image
Cameraman D_x
Middle image
Cameraman D_x
Rightmost image
Cameraman D_x
Naive mosaic

Part B: Feature Matching for Autostitching

Harris

The goal of this section is to eliminate the need for manual correspondance selection. The first step towards achieving this is through corner detection, for which we use the provided harris starter code. Specifically, what the harris algorithm does is calculate a sum of the gradients around an image location and compute its eignevalues. If the matrix indicates sufficient convexity (above a threshold), we'll consider it a corner. In the provided implementation, the threshold is set to a very low value that identifies a lot of potential corners, but to speed up computation and obtain better results I decided to set the threshold a bit higher from the get-go. This is an example output from the harris corner detector:

Cameraman D_x
Naive Harris results
Original cameraman image
Lower threshold for a point
Cameraman D_x
Higher threshold for a point

Adaptive non-maximal supression

The problem with thresholding directly on the harris value is that we may end up with many points on one section of an image and fewer on a another, yielding suboptimal results. For this reason, we find points that are locally relevant (as opposed to globally significant). I did this by finding the distance from each point to another of greater activation. Then, I kept the k points for which this metric was higher. We should end up with an even spread of corners throughout the image. Here is a nice example:

Original cameraman image

Feature extraction and matching

From the paper, we take the idea to compare downscaled large samples of images around selected corner points rather than naively comparing small patches around the point. This makes the process more stable to slight angle changes between pictures, or similar small errors like that. Also, to avoid aliasing, we first blur the image before extracting features.

Once the features have been extracted, we may compare them to each other. In this case, we apply a simple L2 norm between features to evaluate the likelihood of those two features corresponding to the same image location. Then, I applied Lowe's trick to make outliers less likely (if the first match option is significantly better than the second, then it is likely correct).

Another feature I implemented is checking for matches in both directions, and applying Lowe's trick both ways. That is to say, if (p1, p2) are to be considered a pair, p1 must be significantly better match than any other for p2 and p2 must be so for p1. I found that, in practice, this almost eliminates outliers completely (I haven't seen any in the images I've tried), thereby eliminating the need for RANSAC. Still, I implemented ransac since its required (more on that later). Here is an example of successful matches between two images:

Original cameraman image

RANSAC

The final step of the process is to apply RANSAC for robust homography estimation. The goal of RANSAC is to make our homography algorithm robust to outliers which were misidentified in the steps above.

RANSAC works in the following way:

By doing this I achieved the following results:

Original cameraman image
Ihouse Elevator 1
Cameraman D_x
Ihouse Elevator 2
Cameraman D_x
Manually selected points mosaic
Cameraman D_x
Autostitching mosaic

As desired these images are almost identical. For my other examples, I chose different images from section A since I believe I found nicer pictures

Original cameraman image
Ihouse Patio 1
Cameraman D_x
Ihouse Patio 2
Original cameraman image
Patio mosaic - manual
Cameraman D_x
Patio mosaic - autogen
Original cameraman image
Diwali ground 1
Cameraman D_x
Diwali Ground 2
Original cameraman image
Diwali Manual mosaic
Cameraman D_x
Diwali mosaic - autogen