Learning from Graphical Replay
Ge Yang,† Amy Zhang,† Ari S. Morcos,† Joelle Pineau,†§ Pieter Abbeel,‡ Roberto Calandra†
†Facebook AI Research, §McGill University, ‡UC Berkeley
PAPER|SLIDES
Learning from Graphical Replay (LfGR) is a model-based framework for unsupervised reinforcement learning that uses graph-search to learn long-horizon visuomotor control policies directly from a cognitive map.
Abstract
Designing agents that can rapidly adapt to changing situations while retaining knowledge for a wide variety of tasks remains an open challenge. Humans accomplish this feat with the help of mental models that are structured and contextual in nature, which enables model-driven exploration and fine-grained credit assignment for correcting prior beliefs. In this work, we present Learning from Graphical Replay~(LfGR), a framework for learning reactive visuomotor policies directly from a structured graphical world model, where the latent embedding of each vertex provides context during adaptation. Under this framework, we introduce two new algorithms: a value-based variant called Universal Value Prediction Network~(UVPN), and a policy-search variant called Goal-Relabeled Expert Distillation~(GRED). We demonstrate how graph-search can be used to generate expert supervision for learning reactive control, and present results on difficult to model (e.g., contact rich) and non-stationary environments from pixel input.
BibTex
@preprint{yang2020LfGR,
title={Learning from Graphical Replay},
author={Yang, Ge and Zhang, Amy and Morcos, Ari S. and Pineau, Joelle
and Abbeel, Pieter and Calandra, Roberto}
}