Development of Real Time Analytics of Movies Review Data using PySpark

Aithal, Prakash K and Acharya, Dinesh U and Geetha, M (2019) Development of Real Time Analytics of Movies Review Data using PySpark. International Journal of Recent Technology and Engineering, 7 (6S). pp. 497-500. ISSN 2277-3878

[img] PDF
6879.pdf - Published Version
Restricted to Registered users only

Download (623kB) | Request a copy


The data play the vital role in every organization. The data can be divided into structured, semistructured and unstructured. One can not process the unstructured data in real-time using RDBMS or Hadoop. Spark is an extension of Hadoop architecture which clubs the goodness of both Hadoop and Storm. Spark supports languages such as Scala, Java, Python, and R. The proposed method uses PySpark to analyze the movies review dataset of 50000 reviews by 36409 people for 1539 movies in real-time. Since movie reviews are written by many users in real-time, it is necessary for real-time data analysis. This method finds all the users who are very active in writing the reviews of the movies. This analytics may be used for giving incentives to the active reviewers. Further, the information about more popular movies based on reviews can be gained through analytics. To achieve these tasks basic map, reduce and filter functionalities have been applied. It is found from the analytics that the Movie code B002VL2PTU has been reviewed by the maximum number of people and also it is determined that maximum of 112 reviews were written by the single user with code A3LZGLA88K0LA0. The frequency count of words in the movie review is accomplished, and sentiment of the user can be analyzed using unigrams.

Item Type: Article
Uncontrolled Keywords: Real - time Analytics; BigData; PySpark
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Depositing User: MIT Library
Date Deposited: 10 Jul 2019 09:52
Last Modified: 10 Jul 2019 09:52

Actions (login required)

View Item View Item