This repository is an implementation of On Learning Intrinsic Rewards for Policy Gradient Methods.
@article{zheng2018learning,
title={On Learning Intrinsic Rewards for Policy Gradient Methods},
author={Zheng, Zeyu and Oh, Junhyuk and Singh, Satinder},
journal={arXiv preprint arXiv:1804.06459},
year={2018}
}
This code is based on OpenAI baselines. In addtion, it requires the following:
- Python 3.*
- TensorFlow 1.7.0+
To run A2C+LIRPG
on Atari games:
python -m baselines.a2c.run_atari --env BreakoutNoFrameskip-v4
To run PPO+LIRPG
on delayed Mujoco tasks:
python -m baselines.ppo2.run_mujoco --env Hopper-v2 --reward-freq 20