Download PDFOpen PDF in browserSentiment Analysis of English-Punjabi Code Mixed Social Media Content for Agriculture DomainEasyChair Preprint 28136 pages•Date: February 29, 2020AbstractIn India, more than 70% of the population is dependent on agriculture. Since the independence of India, the people involved in agriculture mostly stay in rural areas. The government has taken numerous efforts for the improvement of the conditions of farmers. Still, the condition is not improved to an acceptable rate. Currently, it has been easy to extract the reviews of farmers from micro-blogging websites. For decades, a trend has been seen that multilingual speakers often switch between more than one language to express themselves on social media networks. Multiple languages are mixed with different rules of grammars, which in itself is the challenging task. In this paper, the authors have extracted the agriculture-related comments having a code-mixing property with English-Punjabi mixed content. Further, the performed language identification, normalization, and creation of the English-Punjabi code-mixed dictionary. After that, we have tested various models trained on English-Punjabi code mixed data using Support Vector Machine and Naive Bayes techniques for sentiment analysis, tested the pipeline for unigram predictive model. Later experimented for n-gram and performance was found to be better in our implemented model. Keyphrases: Agriculture, Language Identification, Naive Bayes, Sentiment Analysis, Support Vector Machines, social media
|