H. Sheikh and L. Bölöni. Universal policies to learn them all. arXiv preprint arXiv:1908.09184, 2019.
We explore a collaborative and cooperative multi-agent reinforcement learning setting where a team of reinforcement learning agents attempt to solve a single cooperative task in a multi-scenario setting. We propose a novel multi-agent reinforcement learning algorithm inspired by universal value function approximators that not only generalizes over state space but also over a set of different scenarios. Additionally, to prove our claim, we are introducing a challenging 2D multi-agent urban security environment where a team of mobile robots are learning to form an optimal formation around a person to protect them from nearby bystanders in a variety of scenarios. Our study shows that state-of-the-art multi-agent reinforcement learning algorithms fail to generalize a single task over multiple scenarios while our proposed solution works equally well as scenario-dependent policies.
@article{Sheikh-2020-ICRA, title = "Universal policies to learn them all", author = "H. Sheikh and L. B{\"o}l{\"o}ni", journal={arXiv preprint arXiv:1908.09184}, xxxbooktitle={submitted to International Conference on Robotics and Automation (ICRA-2020)}, xxxlocation = "Paris", xxxmonth = "May", xxxyear = "2020", year = "2019", abstract = { We explore a collaborative and cooperative multi-agent reinforcement learning setting where a team of reinforcement learning agents attempt to solve a single cooperative task in a multi-scenario setting. We propose a novel multi-agent reinforcement learning algorithm inspired by universal value function approximators that not only generalizes over state space but also over a set of different scenarios. Additionally, to prove our claim, we are introducing a challenging 2D multi-agent urban security environment where a team of mobile robots are learning to form an optimal formation around a person to protect them from nearby bystanders in a variety of scenarios. Our study shows that state-of-the-art multi-agent reinforcement learning algorithms fail to generalize a single task over multiple scenarios while our proposed solution works equally well as scenario-dependent policies. }, }
Generated by bib2html.pl (written by Patrick Riley, Lotzi Boloni ) on Thu Sep 26, 2019 13:02:58