Feature Selection using Submodular Approach for Financial Big Data

Attigeri, Girija V and Pai, Manohara M.M. and Pai, Radhika M (2019) Feature Selection using Submodular Approach for Financial Big Data. Journal of Information Processing Systems. ISSN 1976-913X

[img] PDF
8061.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy


As the world is moving towards digitization, data is generated from various sources at a faster rate. It is getting humungous and is termed as big data. The financial sector is one domain which needs to leverage the big data being generated to identify financial risks, fraudulent activities, and so on. The design of predictive models for such financial big data is imperative for maintaining the health of the country’s economics. Financial data has many features such as transaction history, repayment data, purchase data, investment data, and so on. The main problem in predictive algorithm is finding the right subset of representative features from which the predictive model can be constructed for a particular task. This paper proposes a correlation-based method using submodular optimization for selecting the optimum number of features and thereby, reducing the dimensions of the data for faster and better prediction. The important proposition is that the optimal feature subset should contain features having high correlation with the class label, but should not correlate with each other in the subset. Experiments are conducted to understand the effect of the various subsets on different classification algorithms for loan data. The IBM Bluemix Big Data platform is used for experimentation along with the Spark notebook. The results indicate that the proposed approach achieves considerable accuracy with optimal subsets in significantly less execution time. The algorithm is also compared with the existing feature selection and extraction algorithms.

Item Type: Article
Uncontrolled Keywords: Financial big data, Feature subset selection, Correlation, Submodular Optimization, Classification, Support Vector Machine, Logistic regression
Subjects: Engineering > MIT Manipal > Information and Communication Technology
Depositing User: MIT Library
Date Deposited: 25 Sep 2020 06:16
Last Modified: 25 Sep 2020 06:16
URI: http://eprints.manipal.edu/id/eprint/155649

Actions (login required)

View Item View Item