H. Sheikh and L. Bölöni

Universal policies to learn them all


Cite as:

H. Sheikh and L. Bölöni. Universal policies to learn them all. arXiv preprint arXiv:1908.09184, 2019.

Download:

Download Video 

Abstract:

We explore a collaborative and cooperative multi-agent reinforcement learning setting where a team of reinforcement learning agents attempt to solve a single cooperative task in a multi-scenario setting. We propose a novel multi-agent reinforcement learning algorithm inspired by universal value function approximators that not only generalizes over state space but also over a set of different scenarios. Additionally, to prove our claim, we are introducing a challenging 2D multi-agent urban security environment where a team of mobile robots are learning to form an optimal formation around a person to protect them from nearby bystanders in a variety of scenarios. Our study shows that state-of-the-art multi-agent reinforcement learning algorithms fail to generalize a single task over multiple scenarios while our proposed solution works equally well as scenario-dependent policies.

BibTeX:

@article{Sheikh-2020-ICRA,
title = "Universal policies to learn them all",
author = "H. Sheikh and L. B{\"o}l{\"o}ni",
journal={arXiv preprint arXiv:1908.09184},
xxxbooktitle={submitted to International Conference on Robotics and Automation (ICRA-2020)},
xxxlocation = "Paris",
xxxmonth = "May",
xxxyear = "2020",
year = "2019",
abstract = {
  We explore a collaborative and cooperative multi-agent reinforcement learning setting where a team of reinforcement learning agents attempt to solve a single cooperative task in a multi-scenario setting.  We propose a novel multi-agent reinforcement learning algorithm inspired by universal value function approximators that not only generalizes over state space but also over a set of different scenarios. Additionally, to prove our claim, we are introducing a challenging 2D multi-agent urban security environment where a team of mobile robots are learning to form an optimal formation around a person to protect them from nearby bystanders in a variety of scenarios. Our study shows that state-of-the-art multi-agent reinforcement learning algorithms fail to generalize a single task over multiple scenarios while our proposed solution works equally well as scenario-dependent policies.
 },
}

Generated by bib2html.pl (written by Patrick Riley, Lotzi Boloni ) on Thu Sep 26, 2019 13:02:58