Optimizing Phishing Detection with Advanced Feature Vectorization and Supervised Machine Learning Techniques

EasyChair Preprint 15168

14 pages•Date: September 29, 2024

Abstract

Phishing attacks are among the most common cybersecurity threats, taking advantage of users' trust to gain access to sensitive information. Detecting these attacks effectively is essential for protecting both individuals and organizations. This study focuses on improving phishing detection by developing an optimized framework for feature vectorization, combined with supervised machine learning techniques. By carefully selecting and designing features from email and website data, the goal is to enhance the accuracy of identifying phishing attempts. The analysis includes various text-based, URL-based, and metadata features, emphasizing their role in improving classification performance. Machine learning models such as Support Vector Machines (SVM), Random Forest, and Gradient Boosting are trained and tested on a dataset of legitimate and phishing samples. The study also examines the impact of feature scaling, selection, and dimensionality reduction methods like Principal Component Analysis (PCA) to determine which factors most effectively boost detection accuracy. Experimental findings show that an optimized feature set, combined with strong machine learning algorithms, greatly enhances phishing detection rates while reducing false positives. This approach highlights the potential for reliable, automated phishing detection systems, contributing to stronger cybersecurity defenses.Phishing attacks are among the most common cybersecurity threats, taking advantage of users' trust to gain access to sensitive information. Detecting these attacks effectively is essential for protecting both individuals and organizations. This study focuses on improving phishing detection by developing an optimized framework for feature vectorization, combined with supervised machine learning techniques.

Keyphrases: Feature Vectorization, Phishing Detection, Supervised Machine Learning

Links:

https://easychair.org/publications/preprint/R2gf

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15168,
  author    = {Jessica Comsie and Ayesha Noor},
  title     = {Optimizing Phishing Detection with Advanced Feature Vectorization and Supervised Machine Learning Techniques},
  howpublished = {EasyChair Preprint 15168},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser