Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of Classifier Performance

Rao, Rohini R and Makkithaya, Krishnamoorthi (2017) Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of Classifier Performance. International Journal of Electrical and Computer Engineering, 7 (4). pp. 2215-2222. ISSN 2088-8708

[img] PDF
3121.pdf - Published Version
Restricted to Registered users only

Download (266kB) | Request a copy

Abstract

Public health care systems routinely collect health-related data from the population. This data can be analyzed using data mining techniques to find novel, interesting patterns, which could help formulate effective public health policies and interventions. The occurrence of chronic illness is rare in the population and the effect of this class imbalance, on the performance of various classifiers was studied. The objective of this work is to identify the best classifiers for class imbalanced health datasets through a cost-based comparison of classifier performance. The popular, open-source data mining tool WEKA, was used to build a variety of core classifiers as well as classifier ensembles, to evaluate the classifiers‟ performance. The unequal misclassification costs were represented in a cost matrix, and cost-benefit analysis was also performed. In another experiment, various sampling methods such as under-sampling, over-sampling, and SMOTE was performed to balance the class distribution in the dataset, and the costs were compared. The Bayesian classifiers performed well with a high recall, low number of false negatives and were not affected by the class imbalance. Results confirm that total cost of Bayesian classifiers can be further reduced using cost-sensitive learning methods. Classifiers built using the random under-sampled dataset showed a dramatic drop in costs and high classification accuracy.

Item Type: Article
Uncontrolled Keywords: Class imbalance, Classifier accuracy,Cost benefit analysis,Data mining,Healthcare
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Engineering > MIT Manipal > MCA
Depositing User: MIT Library
Date Deposited: 06 Sep 2017 10:36
Last Modified: 06 Sep 2017 10:36
URI: http://eprints.manipal.edu/id/eprint/149630

Actions (login required)

View Item View Item