Zone-based Structural feature extraction for Script Identification from Indian Documents

Gopakumar, Rajesh and Subbareddy , NV and Makkithaya, Krishnamoorthi and Acharya, Dinesh U (2010) Zone-based Structural feature extraction for Script Identification from Indian Documents. In: 5th International Conference on Industrial and Information Systems, Jul 29 - Aug 01, 2010.

[img] PDF
05578668.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL:


Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Multilingual document; Script identification; Zonebased structural features; Wrapper subset selection; Filter approach; k-Nearest Neighbor classifier; Support Vector Machine
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Engineering > MIT Manipal > MCA
Depositing User: MIT Library
Date Deposited: 08 Jun 2011 05:49
Last Modified: 09 Jun 2011 06:21

Actions (login required)

View Item View Item