Download PDFOpen PDF in browser

Evaluating the Effectiveness of Machine Learning Methods for Spam Detection

EasyChair Preprint no. 6074

7 pagesDate: July 13, 2021

Abstract

Technological advances are accelerating the dissemination of information. Today, millions of devices and their users are connected to the Internet, allowing businesses to interact with consumers regardless of geography. People all over the world send and receive emails every day. Email is an effective, simple, fast, and cheap way to communicate. It can be divided into two types of emails: spam and ham. More than half of the letters received by the user – spam. To use Email efficiently without the threat of losing personal information, you should develop a spam filtering system. The aim of this work is to reduce the amount of spam using a classifier to detect it. The most accurate spam classification can be achieved using machine learning methods. A natural language processing approach was chosen to analyze the text of an email in order to detect spam. For comparison, the following machine learning algorithms were selected: Naive Bayes, K-Nearest Neighbors, SVM, Logistic regression, Decision tree, Random forest. Training took place on a ready-made dataset. Logistic regression and NB give the highest level of accuracy – up to 99%. The results can be used to create a more intelligent spam detection classifier by combining algorithms or filtering methods.

Keyphrases: Decision Tree, k-nearest neighbors, logistic regression, Naive Bayes, Random Forest, Spam, Spam Filtering Method, SVM

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:6074,
  author = {Yuliya Kontsewaya and Evgeniy Antonov and Alexey Artamonov},
  title = {Evaluating the Effectiveness of Machine Learning Methods for Spam Detection},
  howpublished = {EasyChair Preprint no. 6074},

  year = {EasyChair, 2021}}
Download PDFOpen PDF in browser