Wednesday, December 9, 2009

Urdu OCR - A Digital Dream


Optical Character Recognition is a unique approach developed for recognizing isolated character that requires less complex calculations but still giving adequate results. In case of document image recognition, an additional step of detecting lines of text and possible set of character among those lines is a requisite. There are numerous methods available for character recognition. From numerical and statistical approach to AI based approach in an increasing order of their recognition accuracy, respectively. None of the approaches stated has recognition accuracy of 100%.Even the humans are not credited with absolute recognition accuracy. The main objective of the recognition software is to help its user in more physically tiring and cumbersome work of actually typing the whole document especially for a user. The error correction still resides with its user only. Hence, a recognition accuracy of even about 90% gives very satisfactory results. Apart from all this, the image quality also plays a very important role in the recognition accuracy.

So, a research project named Urdu OCR – A Digital Dream from Usman Institute of Technology fulfilling the needs. The team members of this project are Abdul Wahab(ME), Shuwair Sardar, and Muhammad Abdul Sammad Khan. First prize winner of Combat 2008 (Software Competition – PAF Kiet) and Software Exhibition (Software Competition – SZABIST) and Second prize winner in NED


Urdu OCR is developed for first time. It has not been developed yet. The need of this product is in the printing media like Urdu news paper and magazines. It is useful in converting the books of Urdu in digital format, the large amount of useful and heritage data in Urdu language which are in vanishing form can be saved in digital format. It can produce electronic books and digital Urdu library online.




Blog Ref: http://greenwhite.org/blog/2008/12/18/urdu-ocr-a-digital-dream/