REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS

Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki Tom Schaul, Joel Z Leibo, David Silver & Koray Kavukcuoglu

This paper brings together the state-of-the-art Asynchronous Advantage Actor-Critic (A3C) framework (Mnih et al., 2016), outlined in Section 2, with auxiliary control tasks and auxiliary reward tasks, defined in sections Section 3.1 and Section 3.2 respectively.