Multi-Step Prediction for Curiosity Driven Learning

Ruchir Aggarwal* Kushantha U. Attanayake* Dennis Li*

Julio Soldevilla+ Poorani Ravindhiran^

* Computer Science and Engineering, University of Michigan
+ Mathematics, University of Michigan
^ Robotics, University of Michigan

"Learning is by nature, curiosity" - Plato

Reinforcement learning has made great strides at exploring virtual environments, especially those of video games because of their well structured reward systems. However, it struggles in sparse reward scenarios. Curiosity driven exploration, where the agent is intrinsically motivated to explore the environment, has been proposed as a solution to this issue. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. However, current implementations of curiosity based learning only predict one time step into the future. As a result, if a catastrophic event were to happen two time steps away, the agent would be none the wiser.

In this project,

We investigate the effects of generating multile step predictions into the future, and by using all predictions in our definition of curiosity.
We perform experiments with different weight combinations for multiple step predictions and present results
We explore the performance of our agent in different environments, 3 games from Atari suite and Super Mario Bros, and see how generalisable the learned features were.

Future Work

For future work, we intend to:

Extend our work to more Atari games to evaluate the performance over different environemnts. We also intend to extend the work to #D environments like Mujoco.
Due to time constraints, we were able to perform only 2 time step predictions in the project and we want to extend it to t timesteps for future work.

Related Work

Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros. Large-Scale Study of Curiosity-Driven Learning. In ICLR 2019.

Acknowledgement

We would like to thank Ruben Villegas for proposing this project and helping us understand the different concepts involved. We would also like to extend our thanks to Prof. Honglak Lee for all his help.

EECS598-012