Vipul Sharma

Document Scanner using Python + OpenCV

Saturday, January 09, 2016

(Coldplay origami star; see bottom for link to tutorial)

There's an amazing Android app called CamScanner which lets you use the camera of your mobile phone and scan any text document. I've been using the app since few months and the best thing about the app I like is its perspective transformation i.e. to transform an angled image (non-top-down clicked image) and display it as if it was captured top-down at 90 degrees. What is worth praising is that the transformed image is quite clear and sharp. Another good feature I like is its smart cropping. It automatically detects the document boundary and even allows the user to crop it as per the requirement.

Being a Computer Vision enthusiast, I thought of building a pretty unsophisticated and rustic implementation of a document scanner using OpenCV and Python.

For all the impatient folks, TL;DR here is the link to the code : https://github.com/vipul-sharma20/document-scanner

My sincere thanks to the article and the author here: http://www.pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/ which has some really good set of articles on OpenCV and way more informative.

Implementation of Scanner

In layman's terms:

Capture image
Detect edges
Extract desired object / define contours
Apply perspective transformation on extracted object
Thresholding text content (If required)

Capture Image

I could've used my webcam but it cannot capture images which are readable enough. Therefore, for illustration I've captured a test image of a document from my phone's camera.

Original Image (Document to scan)

The original image is resized and scaled down as OpenCV's methods may not perform accurately for very large dimensions. (Above image is the scaled down/resized version)

The original image is converted to grayscale and then blurred using Gaussian Blur technique.

Original Image (Grayscaled)

Original Image(Gaussian Blurred)

(notice that this is image is smoother than above)

By blurring, we create smooth transition from one color to another and to reduce noise and edge content. But, we have to be careful with the extent of blur as we DO want our script to detect edges of the document.

Edge Detection

Edge detection technique is used to find boundaries of objects in an image by analyzing varying brightness in the image. Here, it is being used for segmenting image. More precisely, we'll use Canny Edge Detection technique.

Edged Image (Canny Edge Detection)

Contour Detection

After performing Edge Detection, we'll try to extract the document to be scanned from the image. Therefore, we'll find the document boundary by drawing contours around the edges detected and choose the appropriate contour.

Drawing all contours

Looks beautiful right :) ?

Here, we can see that there is a boundary traced along the edges of our document but there are some other irrelevant contours too. Also, it is clearly visible that the area within the contour of the document is larger than the area enclosed by any other contours and we can use this fact to get the right boundary to extract our document.

Let's get rid of the extraneous contours by selecting the contour of largest area. To get a boundary with only 4 vertices, I have approximated the contour; which means, to approximate a contour to another shape which has a less number of vertices.

Boundary around the document (Contour Approximated)

Perspective Transform

The original image is captured at an angle and is not perfectly top-down image which was deliberately done. Even if we crop the image around the contour, the cropped content would not look like a scanned document. A scanned document is always as if it was captured/scanned exactly from vertically above.

Therefore, we'll apply perspective transformation. In perspective transformation, one arbitrary quadrilateral is mapped to another and hence, a skewed image (quadrilateral) can be transformed into a square/rectangle by defining a new mapping for each pixel.

Some nice discussion regarding the equations involved and what takes place behind the scenes: http://stackoverflow.com/questions/3190483/transform-quadrilateral-into-a-rectangle

Perspective Transformation

This looks better. If someone wants to give it a B/W look and feel, one can always try thresholding!

If we threshold the above image using Adaptive Gaussian Thresholding method we can get a B/W document.

Adaptive Gaussian Thresholding

As mentioned earlier, the original image was scaled down before processing. Therefore, the above two images are not as sharp and clear as they could've been which is one of the issue I am looking forward to fix. I need to find out a better way to get an optimally scaled image.

TODO (I would love to hear your suggestions):

Resolve issue regarding the use of scaled down image
Maybe use an image to pdf converter to convert the scanned image to pdf
Refactor the code (like an API ?) before it wreaks havoc
Test with more images, angles, colors, sizes, background and optimize .. optimize .. optimize
Add issues here : https://github.com/vipul-sharma20/document-scanner/issues