Learning from Graphical Replay

Ge Yang,^† Amy Zhang,^† Ari S. Morcos,^† Joelle Pineau,^†§ Pieter Abbeel,^‡ Roberto Calandra^†

^†Facebook AI Research, ^§McGill University, ^‡UC Berkeley

PAPER|SLIDES

Learning from Graphical Replay (LfGR) is a model-based framework for unsupervised reinforcement learning that uses graph-search to learn long-horizon visuomotor control policies directly from a cognitive map.

Abstract

Designing agents that can rapidly adapt to changing situations while retaining knowledge for a wide variety of tasks remains an open challenge. Humans accomplish this feat with the help of mental models that are structured and contextual in nature, which enables model-driven exploration and fine-grained credit assignment for correcting prior beliefs. In this work, we present Learning from Graphical Replay~(LfGR), a framework for learning reactive visuomotor policies directly from a structured graphical world model, where the latent embedding of each vertex provides context during adaptation. Under this framework, we introduce two new algorithms: a value-based variant called Universal Value Prediction Network~(UVPN), and a policy-search variant called Goal-Relabeled Expert Distillation~(GRED). We demonstrate how graph-search can be used to generate expert supervision for learning reactive control, and present results on difficult to model (e.g., contact rich) and non-stationary environments from pixel input.

BibTex

@preprint{yang2020LfGR,
    title={Learning from Graphical Replay},
    author={Yang, Ge and Zhang, Amy and Morcos, Ari S. and Pineau, Joelle
            and Abbeel, Pieter and Calandra, Roberto}
}