Secure EMR Classification and Deduplication Using MapReduce

Usharani, A V and Attigeri, Girija V (2022) Secure EMR Classification and Deduplication Using MapReduce. IEEE Access, 10. pp. 34404-34414. ISSN 2169-3536

[img] PDF
15202.pdf - Published Version
Restricted to Registered users only

Download (718kB) | Request a copy

Abstract

Healthcare providers generate huge amount of data every day through registration, lab results, prescriptions, and others. This is stored in the form of Electronic Medical Records (EMR) in a central repository. A medical record data is very huge, difficult to read and understand. To give an insight to the professionals in analyzing the different domains a patient belongs to, it is necessary to get pointers to a file before classifying it to a particular department for further analysis. This study provides a EMR processing system to automatically classify EMRs based on the important medical terms using TF-IDF and topic modeling. Automatic Classification of EMRs help the healthcare professionals in taking accurate decisions, providing efficient service, and improves the time taken for processing huge amount of data and in better organizing of patient. The data stored on the cloud may contain duplicate copies of EMR on several storage systems at file level thus increasing the network bandwidth, cost, and consuming storage space. Hence, a deduplication mechanism is required to avoid or reduce the data redundancy. Adapting cloud computing for healthcare systems necessitates sharing patient data with cloud service providers, which creates security concerns as the data may contain diagnosis, medication, laboratory results and medical claims. The main aim of this work is to classify the EMRs as per the specialization using KNN algorithm, optimize storage using deduplication and protect the data using DNA encryption algorithm before uploading to Hadoop. Data redundancy is taken care by implementing deduplication techniques using MD5 hashing. Proposed methodology shows an accuracy of 90% for EMR record classification and handles duplication and security aspects. This in-turn proves the state of the art approach for health care data management.

Item Type: Article
Uncontrolled Keywords: Classification, clustering, deduplication, DNA encryption, electronic medical records, Hadoop, map reduce
Subjects: Engineering > MIT Manipal > Information and Communication Technology
Depositing User: MIT Library
Date Deposited: 23 Jun 2022 05:50
Last Modified: 23 Jun 2022 05:50
URI: http://eprints.manipal.edu/id/eprint/158860

Actions (login required)

View Item View Item