Extracting text from images ~ TechSoftEng

Tuesday, August 28, 2012

Extracting text from images

In this blog post we are going to discuss about the tools and api for extracting the text from the image.Some of these tools and apis are free and open source but some of them are not free but still you can send request to the company to get developer’s version for free.
Java OCR:
Java OCR is a simple and efficient java based open source library for image processing and optical character recognition.This library can also be used in the development of android based project.

If you want to test it just click here to download it and give it a try.
Tesseract-OCR:
This is the one of the best optical character recognition open source library available .It was originally developed by HP Labs but now supported by Google.It is the one of the most accurate OCR engine. It can read wide variety of image formats and text read can be converted into different languages also.As of now It supports more than 40 languages and new languages are being implemented.
It is developed in c++ and if you are a java developer then don’t worry you may be able find some JNI wrapper for it.
Tess4j is one of the available java JNI wrapper of tesseract-OCR.
There are some graphical front end already available.If you want to check the functionality of Tesseract-OCR you can download gImageReader,VietOCR .Other than them there are many more GUI front end are available on the web.

Asprise OCR for Java:
Asprise Lab provides Asprise OCR SDK for development of OCR in java.It also provides the web based solution for optical character recongition.It enables us to equip our Java applications (Java applets, web applications, standard applications, J2EE enterprise applications) with optical character recognition (OCR) ability.