Download PDFOpen PDF in browser

A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples

EasyChair Preprint no. 8409

10 pagesDate: July 8, 2022

Abstract

Basic machine learning algorithms or transfer learning models work well for language categorization, but these models require a vast volume of annotated data. We need a better model to tackle the problem because labeled data is scarce. This problem may have a solution in GAN-BERT. To classify Bengali text, we developed a GAN-BERT based model, which is an adapted version of BERT. We used two different datasets for this purpose. One is a hate speech dataset, while the other is a fake news dataset. To understand how the GAN-BERT and traditional BERT models behave with Bangla datasets, we experimented with both. With a small quantity of data, we were able to get a satisfactory result using GAN-BERT. We also demonstrated how the accuracy increases as the number of training samples increases. A comparison of performance between traditional BERT based Bangla-BERT and our GAN-Bangla-BERT model is also shown here, where we can see how these models react to a small number of labeled data.

Keyphrases: Bangla NLP, Bengali Text Classification, Fake New Detection, GAN, GAN-BERT, hate speech detection, NLP, SS-GAN

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:8409,
  author = {Raihan Tanvir and Md Tanvir Rouf Shawon and Md Humaion Kabir Mehedi and Md. Motahar Mahtab and Annajiat Alim Rasel},
  title = {A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples},
  howpublished = {EasyChair Preprint no. 8409},

  year = {EasyChair, 2022}}
Download PDFOpen PDF in browser