Download PDFOpen PDF in browserSentiment Analysis using Unlabeled Email dataEasyChair Preprint 20806 pages•Date: December 1, 2019AbstractSentiment Analysis (SA) in the context of text mining is an automated process to detect subjectivity information, such as opinions, attitudes, emotions and feeling. Most prior work in SA view it as a text classification problem which needs labeled data to train the model. However, it is tough to get a labeled dataset. Most of the times we will need to do it by hand. Another issue is that the lack of portability across different domains makes it hard to use the same labeled data in different applications. Thus, we need to create labeled data for each domain manually. In this paper, we will use sentiment analysis to analyze the Enron email dataset. This work aims to find the best techniques to label the dataset automatically and avoid manual labeling. The training data is used to build a classifier using a supervised machine learning algorithm. In the labeling phase, we compare the lexicon labeling with k-mean labeling. Lexicon labeling gave better and reliable results. We used this labeled dataset to train the classifier. We used TF-IDF for feature extraction, to train Naïve Bayes and Support vector machine (SVM) classifiers. Keyphrases: Chi-square, Email data, K-means, Semantic Orientation, Sentiment Analysis, Support Vector Machine, TFIDF, Target Label, email dataset, enron email dataset, feature extraction, frequency inverse document frequency, k-mean labeling, labeled dataset, lexicon labeling, negative email, sentiment classification, stop word
|