MU Digital Repository
Logo

Multi-Script Line Identification System for Indian Languages

Acharya, Dinesh U and Gopakumar, Rajesh and Aithal, Prakash K (2010) Multi-Script Line Identification System for Indian Languages. JOURNAL OF COMPUTING, 2 (11). pp. 107-111. ISSN 2151-9617

[img] PDF
Multi-Script-Line-Identification-System-for-Indian-Languages.pdf - Published Version

Download (527kB)

Abstract

India is a multilingual multi-script country. There are totally 18 official languages and 12 scripts in India. For Optical Character Recognition (OCR) of such a multi-lingual document, it is necessary to identify the script before feeding the text lines to the OCRs of individual scripts. In this paper, a simple and efficient technique of script identification for Kannada, Malayalam, Telugu, Tamil, Gujarati, Hindi and English text lines from a printed document is presented. The proposed system uses horizontal projection profile, Vertical projection profile and Top pitch information to distinguish the seven scripts. The knowledge base of the system is developed based on 50 different document images containing about 250 text lines of each script. The proposed system is tested on 50 different document images containing about 250 text lines of each script and an overall classification rate of 97.64% is achieved.

Item Type: Article
Uncontrolled Keywords: Multilingual Indian Scripts, Script identification, Horizontal projection profile, Vertical projection profile and Top pitch.
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Depositing User: MIT Library
Date Deposited: 14 Jun 2011 08:46
Last Modified: 14 Jun 2011 08:46
URI: http://eprints.manipal.edu/id/eprint/232

Actions (login required)

View Item View Item