Data Deduplication Techniques and Analysis

Maddodi, Srivatsa and Attigeri, Girija V and Kotegar, Karunakar A (2010) Data Deduplication Techniques and Analysis. In: Third International Conference on Emerging Trends in Engineering and Technology, 19-21 Nov. 2010 , Goa .

[img] PDF
2_Data_Deduplication_Techniques_and_Analysis.pdf - Published Version
Restricted to Registered users only

Download (592kB) | Request a copy
Official URL:


Data warehouses are the repositories of data collected form several data sources, which form the backbone of most of the decision support applicatons. As the data sources are independent, they may adopt independent and potentially inconsistent conventions. In data warehousing applicatons during ETL ( Extraction, Transformation and Loading). or even in OLTP (On Line Transaction Processing) applicaitons we are often encountered with duplicate records in table. Moreover, dta entry mistakes at any of these sources introduce more errors Since high quality data is essential for gaining the confidence of users of decision support applications, ensuring high data quality is cirtical to the success fo dta warehouse implementations, Therefore, significant amount of time and money are spent on the process of detecting and correcting errors and inconsistences. The process of cleaning dirty data is often referred to as data cleaning. To make the table dtat consistent and accurate we need to get rid of these duplicate records form the table. In this paper we discuss different strategies of Deduplication along with their pros and cons and some of methods used to prevent duplicaitonindatabse. In addition we have made performance evaluation with Microsoft Sql-Server 2008 of Food Mart and Adventure DB Warehouse

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: ETL, OLTP, Data Cleaning, Deduplication.
Subjects: Engineering > MIT Manipal > Information and Communication Technology
Engineering > MIT Manipal > MCA
Depositing User: MIT Library
Date Deposited: 04 Jul 2011 09:03
Last Modified: 07 Jul 2011 10:44

Actions (login required)

View Item View Item