Download PDFOpen PDF in browserAdversarial Attacks on BERT-Based Fake News Detection ModelsEasyChair Preprint 1494915 pages•Date: September 20, 2024AbstractThe rise of fake news poses significant challenges to the integrity of information dissemination, necessitating robust detection mechanisms. BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art model in natural language processing, has shown promising results in identifying fake news. However, its susceptibility to adversarial attacks—deliberate perturbations designed to mislead models—raises concerns about its reliability. This paper explores the vulnerabilities of BERT-based fake news detection models to various adversarial attacks, including textual perturbations and gradient-based methods. We examine the impact of these attacks on model performance, highlighting a significant reduction in detection accuracy. Furthermore, we discuss potential defenses, such as adversarial training, input transformation techniques, and the development of more robust model variants. By addressing these adversarial challenges, we aim to enhance the resilience of fake news detection systems, ensuring more reliable and trustworthy automated news verification. This study underscores the necessity for ongoing research and innovation to fortify NLP models against adversarial threats in real-world applications. Keyphrases: Fake News Detection, Gradient-based attacks, Paraphrasing Attacks, Textual Perturbations, adversarial attacks
|