An Apache Spark based implementation of Single Pass Counting Algorithm for Frequent Itemset Mining

Bhat, Anup B and Harish, S V and Geetha, M (2018) An Apache Spark based implementation of Single Pass Counting Algorithm for Frequent Itemset Mining. In: NCAMSE-2018, 20/06/2018, Manipal.

[img] PDF
1123.pdf - Published Version
Restricted to Registered users only

Download (79kB) | Request a copy

Abstract

Frequent Itemset Mining has evolved as a popular and general data mining task since its inception. It is an indispensable step in discovering associations between the items in a transactional database. As the size of the databases increases massively, sequential approach for mining suffer a bottleneck both in terms of memory and execution time. In this study, we explore the applicability of customised Apriori algorithm in a parallel and distributed framework called Single Pass Counting algorithm using Apache Spark and Hadoop Distributed File System (HDFS).

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Frequent Itemsets, Parallel and Distributed Computing, Apache Spark, Apriori algorithm, Single Pass Counting
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Depositing User: MIT Library
Date Deposited: 16 Jul 2018 06:52
Last Modified: 16 Jul 2018 06:52
URI: http://eprints.manipal.edu/id/eprint/151590

Actions (login required)

View Item View Item