MU Digital Repository
Logo

Script Identification from Multilingual Indian Documents using Structural Features

Gopakumar, Rajesh and Subbareddy , NV and Makkithaya, Krishnamoorthi and Acharya, Dinesh U (2010) Script Identification from Multilingual Indian Documents using Structural Features. Journal of Computing, 2 (7). pp. 106-111. ISSN 2151-9617

[img] PDF
34833098-Script-Identification-from-Multilingual-Indian-Documents-using-Structural-Features.pdf - Published Version

Download (220kB)

Abstract

Script Identification from a given document image is an important process for many computer applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. Very good classification accuracy is achieved on the optimal feature set

Item Type: Article
Uncontrolled Keywords: Indian Scripts, Multilingual document, Script identification, Zone-based structural features
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Engineering > MIT Manipal > MCA
Depositing User: MIT Library
Date Deposited: 07 Jun 2011 05:56
Last Modified: 09 Jun 2011 06:17
URI: http://eprints.manipal.edu/id/eprint/149

Actions (login required)

View Item View Item