finite-time convergence analysis, Overestimation Bias, Q-learning, Reinforcement Learning, zero-sum game.