Download PDFOpen PDF in browserTopical News Classification Using Machine Learning TechniquesEasyChair Preprint 145128 pages•Date: August 21, 2024AbstractNews is information that is presented through print, broadcast, the internet, or word of mouth. The problem is to classify news based on its appropriate categories to help users find relevant news quickly. A classifier engine is used to automatically categorize news into the appropriate category. This research uses Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), and Support Vector Machine (SVM) to classify topical news. The aim of this research is to develop a framework for categorizing news articles into various categories. The objectives of this work are to preprocess the data using Term Frequency Inverse Document Frequency (tf-idf) and Bag of Words (BoW), which are suitable inputs for the classifier, apply Machine Learning (ML) techniques to the preprocessed data, evaluate the performance of the machine learning classifiers on the preprocessed data, and obtain the highest accuracy of the machine learning classifiers suitable for the preprocessed data. The results of this research shows higher accuracy and performance with SVM using tf-idf while NB Classifier accuracy is higher using BoW. Based on this finding, SVM is a better classifier than NB, RF and DT using TFIDF having an accuracy of 83% while NB is a better classifier than SVM, RF and DT using BoW having an accuracy of 82%. Additionally, SVM is a better classifier for large datasets, while NB performs better with smaller datasets. Keyphrases: Bag of Words, Machine Learning Classifiers, Naïve Bayes, Support Vector Machine, Term Frequency-Inverse Document Frequency
|