Suppressing Overestimation in Q-Learning Through Adversarial Behaviors

EasyChair Preprint 15228

13 pages•Date: October 9, 2024

Abstract

The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the dummy player, the learning can be formulated as a two-player zero-sum game. The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning (proposed in this paper) in a single framework. The proposed DAQ is a simple but effective way to suppress the overestimation bias through dummy adversarial behaviors and can be easily applied to off-the-shelf value-based reinforcement learning algorithms to improve the performances. A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning. The performance of the suggested DAQ is empirically demonstrated under various benchmark environments.

Keyphrases: Overestimation Bias, Q-learning, Reinforcement Learning, finite-time convergence analysis, zero-sum game

Links:

https://easychair.org/publications/preprint/dLQB

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15228,
  author    = {Hyeann Lee and Donghwan Lee},
  title     = {Suppressing Overestimation in Q-Learning Through Adversarial Behaviors},
  howpublished = {EasyChair Preprint 15228},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser