Robustness of Large Language Models: Mitigating Adversarial Attacks and Input Perturbations

EasyChair Preprint 12274

9 pages•Date: February 24, 2024

Abstract

This paper explores the robustness of LLMs and strategies for mitigating the impact of adversarial attacks and input perturbations. Adversarial attacks, where small, carefully crafted perturbations are added to input data to induce misclassification or undesired behavior, can exploit vulnerabilities in LLMs and compromise their performance. Additionally, input perturbations, such as typographical errors or grammatical inconsistencies, can also degrade the accuracy and reliability of LLMs in practical settings. To address these challenges, various approaches have been proposed, including adversarial training, robust optimization techniques, and input preprocessing methods.

Keyphrases: language, large, models

Links:

https://easychair.org/publications/preprint/fVdg

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:12274,
  author    = {Kurez Oroy and Jane Anderson},
  title     = {Robustness of Large Language Models: Mitigating Adversarial Attacks and Input Perturbations},
  howpublished = {EasyChair Preprint 12274},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser