A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples

EasyChair Preprint 8409

10 pages•Date: July 8, 2022

Raihan Tanvir, Md Tanvir Rouf Shawon, Md Humaion Kabir Mehedi, Md. Motahar Mahtab and Annajiat Alim Rasel

Abstract

Basic machine learning algorithms or transfer learning models work well for language categorization, but these models require a vast volume of annotated data. We need a better model to tackle the problem because labeled data is scarce. This problem may have a solution in GAN-BERT. To classify Bengali text, we developed a GAN-BERT based model, which is an adapted version of BERT. We used two different datasets for this purpose. One is a hate speech dataset, while the other is a fake news dataset. To understand how the GAN-BERT and traditional BERT models behave with Bangla datasets, we experimented with both. With a small quantity of data, we were able to get a satisfactory result using GAN-BERT. We also demonstrated how the accuracy increases as the number of training samples increases. A comparison of performance between traditional BERT based Bangla-BERT and our GAN-Bangla-BERT model is also shown here, where we can see how these models react to a small number of labeled data.

Keyphrases: Bangla NLP, Bengali Text Classification, Fake New Detection, GAN, GAN-BERT, NLP, SS-GAN, hate speech detection

Links:

https://easychair.org/publications/preprint/hp4Q

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:8409,
  author    = {Raihan Tanvir and Md Tanvir Rouf Shawon and Md Humaion Kabir Mehedi and Md. Motahar Mahtab and Annajiat Alim Rasel},
  title     = {A GAN-BERT Based Approach for Bengali Text Classification with a Few Labeled Examples},
  howpublished = {EasyChair Preprint 8409},
  year      = {EasyChair, 2022}}

Download PDF Open PDF in browser