Detection of phishing websites using data mining tools and techniques

Somani, Mansi and Balachandra, Mamatha (2022) Detection of phishing websites using data mining tools and techniques. International journal of advanced Intelligence Paradigims, 22 (1-2). pp. 167-183. ISSN 1755-0386

[img] PDF
15496.pdf - Published Version
Restricted to Registered users only

Download (654kB) | Request a copy


Phishing, a prevailing cyber-security issue, is one of the most common attacks to obtain user’s sensitive information. To eradicate it, the users or software should detect it first. A popular approach to carry out phishing is through generating phishing URLs. A URL could be legitimate or phishy which fits phishing into a perfect classification-type problem in data mining. Hence, data mining algorithms – C4.5 (J48), SVM, Random Forest, Treebag and GBM have been trained to carry out a comparison on measures – accuracy, recall and precision to determine the most suited model. Rules have been listed that categories the features which make a website phishy or legitimate. Work has been done using R language on RStudio. The dataset used comprises of 11,055 tuples and 31 attributes. It is trained, tested and used for detection. Among the five classifiers used, the best accuracy is obtained through Random Forest model which is 97.21%.

Item Type: Article
Uncontrolled Keywords: phishing; security; data mining; URL; features; algorithm; classifiers; accuracy; precision; recall; confusion matrix.
Subjects: Engineering > MIT Manipal > Computer Science and Engineering
Depositing User: MIT Library
Date Deposited: 08 Aug 2022 08:57
Last Modified: 08 Aug 2022 08:57

Actions (login required)

View Item View Item