Document Scanner using Python + OpenCV

(Coldplay origami star; see bottom for link to tutorial)

There's an amazing Android app called CamScanner which lets you use the camera of your mobile phone and scan any text document. I've been using the app since few months and the best thing about the app I like is its perspective transformation i.e. to transform an angled image (non-top-down clicked image) and display it as if it was captured top-down at 90 degrees. What is worth praising is that the transformed image is quite clear and sharp. Another good feature I like is its smart cropping. It automatically detects the document boundary and even allows the user to crop it as per the requirement.

Being a Computer Vision enthusiast, I thought of building a pretty unsophisticated and rustic implementation of a document scanner using OpenCV and Python.

For all the impatient folks, TL;DR here is the link to the code : https://github.com/vipul-sharma20/document-scanner

My sincere thanks to the article and the author here: http://www.pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/ which has some really good set of articles on OpenCV and way more informative.

Implementation of Scanner

In layman's terms:

Capture image
Detect edges
Extract desired object / define contours
Apply perspective transformation on extracted object
Thresholding text content (If required)

Capture Image

I could've used my webcam but it cannot capture images which are readable enough. Therefore, for illustration I've captured a test image of a document from my phone's camera.

Original Image (Document to scan)

The original image is resized and scaled down as OpenCV's methods may not perform accurately for very large dimensions. (Above image is the scaled down/resized version)

The original image is converted to grayscale and then blurred using Gaussian Blur technique.

Original Image (Grayscaled)

Original Image(Gaussian Blurred)

(notice that this is image is smoother than above)

By blurring, we create smooth transition from one color to another and to reduce noise and edge content. But, we have to be careful with the extent of blur as we DO want our script to detect edges of the document.

Edge Detection

Edge detection technique is used to find boundaries of objects in an image by analyzing varying brightness in the image. Here, it is being used for segmenting image. More precisely, we'll use Canny Edge Detection technique.

Edged Image (Canny Edge Detection)

Contour Detection

After performing Edge Detection, we'll try to extract the document to be scanned from the image. Therefore, we'll find the document boundary by drawing contours around the edges detected and choose the appropriate contour.

Drawing all contours

Looks beautiful right :) ?

Here, we can see that there is a boundary traced along the edges of our document but there are some other irrelevant contours too. Also, it is clearly visible that the area within the contour of the document is larger than the area enclosed by any other contours and we can use this fact to get the right boundary to extract our document.

Let's get rid of the extraneous contours by selecting the contour of largest area. To get a boundary with only 4 vertices, I have approximated the contour; which means, to approximate a contour to another shape which has a less number of vertices.

Boundary around the document (Contour Approximated)

Perspective Transform

The original image is captured at an angle and is not perfectly top-down image which was deliberately done. Even if we crop the image around the contour, the cropped content would not look like a scanned document. A scanned document is always as if it was captured/scanned exactly from vertically above.

Therefore, we'll apply perspective transformation. In perspective transformation, one arbitrary quadrilateral is mapped to another and hence, a skewed image (quadrilateral) can be transformed into a square/rectangle by defining a new mapping for each pixel.

Some nice discussion regarding the equations involved and what takes place behind the scenes: http://stackoverflow.com/questions/3190483/transform-quadrilateral-into-a-rectangle

Perspective Transformation

This looks better. If someone wants to give it a B/W look and feel, one can always try thresholding!

If we threshold the above image using Adaptive Gaussian Thresholding method we can get a B/W document.

Adaptive Gaussian Thresholding

As mentioned earlier, the original image was scaled down before processing. Therefore, the above two images are not as sharp and clear as they could've been which is one of the issue I am looking forward to fix. I need to find out a better way to get an optimally scaled image.

TODO (I would love to hear your suggestions):

Resolve issue regarding the use of scaled down image
Maybe use an image to pdf converter to convert the scanned image to pdf
Refactor the code (like an API ?) before it wreaks havoc
Test with more images, angles, colors, sizes, background and optimize .. optimize .. optimize
Add issues here : https://github.com/vipul-sharma20/document-scanner/issues

GitHub Repository: https://github.com/vipul-sharma20/document-scanner

Learn Coldplay origami star: http://cldp.ly/ASFOSpdf :)

Vipul Sharma

Engineering undergraduate (JEC), Pythonista, Open Source Contributor, GSoCer 2015 and a Die hard Coldplay fan :D

12 comments

Lowbar19 August 2016 at 15:29
if len(approx) == 4:
^
TabError: inconsistent use of tabs and spaces in indentation
ReplyDelete
Replies
shanker sharma20 September 2016 at 23:59
Hey Vipul,

Can you put some ideas on how to modify the code so, that it works for colored images.

thanks
ReplyDelete
Replies
umer13 January 2017 at 22:36
Nice work and great style of presenting information about the dokumente scannen pdf it's good work.
dokumente scannen pdf
ReplyDelete
Replies
minhminh12 September 2017 at 06:40
thank
ReplyDelete
Replies
Unknown12 October 2017 at 04:17
Hi Vipul,
Getting following error

AttributeError: 'module' object has no attribute 'rectify'
Using default python 2.7 which comes with ubuntu 14.04
ReplyDelete
Replies
Unknown13 November 2017 at 21:36
It's a nice post.waiting for some more stuff.
Best IT Training in Bangalore
ReplyDelete
Replies
హాట్‌గర్ల్స్11 February 2018 at 21:32

helpful information, thanks for writing and share this information
freelance adwords specialist
ReplyDelete
Replies
Ameen Khatri28 April 2018 at 08:21
Nice blog and absolutely outstanding. You can do something much better but i still say this perfect.Keep trying for the best. best mini projector
ReplyDelete
Replies
Unknown28 July 2018 at 10:54
nice work and excellent clear explanation!!
ReplyDelete
Replies
Unknown17 November 2018 at 12:49
How do I execute the code ? I am providing the python file name and image but it gives me an error. What is the correct syntax for executing program?
ReplyDelete
Replies
GCC Gamers10 August 2021 at 04:14
First Class Quality HP Scanjet Scanner in UAE, Enterprise Flow 5000 S4 Scanner in UAE, Sheet-Feed Document Scanner in UAE Visit now https://gccgamers.com/hp-scanjet-enterprise-flow-5000-s4-sheet-feed-document-scanner-white-l2755a.html
ReplyDelete
Replies
Bharathi24 July 2023 at 22:52
Hi.
I'm delighted to have come across this post. It's an informative blog that we greatly appreciate, and we truly value the shared knowledge. Please continue to post such instructive blogs, and I eagerly anticipate your future updates. Thank you for providing this valuable content. Here is sharing some AlterY Training information may be its helpful to you. AlterYX Training
ReplyDelete
Replies