H. U. Sheikh and L. Bölöni

Reducing Overestimation Bias by Increasing Representation Dissimilarity in Ensemble Based Deep Q-Learning


Cite as:

H. U. Sheikh and L. Bölöni. Reducing Overestimation Bias by Increasing Representation Dissimilarity in Ensemble Based Deep Q-Learning. arXiv preprint arXiv:2006.13823, 2020.

Download:

(unavailable)

Abstract:

The first deep RL algorithm, DQN, was limited by the overestimation bias of the learned Q-function. Subsequent algorithms proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms used the different estimates provided by ensembles of learners to reduce the bias. Unfortunately, in many scenarios the learners converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to increase the dissimilarity in the representation space in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the resulting approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.

BibTeX:

@article{Sheikh-2020-Dissimilarity,
  author = "H. U. Sheikh and L. B{\"o}l{\"o}ni",
  title = "Reducing Overestimation Bias by Increasing Representation Dissimilarity in Ensemble Based Deep Q-Learning",
  journal={arXiv preprint arXiv:2006.13823},
  year={2020},
  xxxbooktitle = "submitted to ICRL",
  abstract = {
      The first deep RL algorithm, DQN, was limited by the overestimation bias of the learned Q-function. Subsequent algorithms proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms used the different estimates provided by ensembles of learners to reduce the bias. Unfortunately, in many scenarios the learners converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to increase the dissimilarity in the representation space in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the resulting approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.
  },
}

Generated by bib2html.pl (written by Patrick Riley, Lotzi Boloni ) on Fri Jan 29, 2021 20:15:22