Skip to content

Latest commit

 

History

History
85 lines (53 loc) · 2.61 KB

README.md

File metadata and controls

85 lines (53 loc) · 2.61 KB

Alex J. Chan and Mihaela van der Schaar

International Conference on Learning Representations (ICLR) 2021

License: MIT Code style: black

Last Updated: 2 March 2021

Code Author: Alex J. Chan ([email protected])

This repo contains a JAX based implementation of the Approximate Variational Reward Imitation Learning (AVRIL) algorithm. The code is ready to run on the control environments in the OpenAI Gym, with pre-run expert trajectories stored in the volume folder.

Given demonstrations, AVRIL learns an approximate posterior distributon over the agents reward function as well as an optimal policy with respect to said reward.

This repo is pip installable - clone it, optionally create a virtual env, and install it (this will automatically install dependencies):

git clone https://github.com/XanderJC/scalable-birl.git

cd scalable-birl

pip install -e .

Example usage:

from sbirl import avril, load_data

# First setup the data, I have provided a helper function for dealing 
# with the OpenAI gym control environemnts

inputs,targets,a_dim,s_dim = load_data('CartPole-v1',num_trajs=15)

# However, AVRIL can handle any data appropriately formatted, that is inputs
# that are (state,next_state) pairs and targets that are (action, next_action)
# pairs:
# inputs = [num_pairs x 2 x state_dimension]
# targets = [num_pairs x 2 x 1]

# You can define the reward to be state-only or state-action depending on use

agent = avril(inputs,targets,s_dim,a_dim,state_only=True)

# Train for set number of iterations with desired batch-size

agent.train(iters=5000,batch_size=64)

# Now test by rolling out in the live Gym environment

agent.gym_test('CartPole-v1')

We can see the trained agent can now balance the pole:

This example can be run simply from the shell using:

python sbirl/models.py

Citing

If you use this software please cite as follows:

@inproceedings{chan2021scalable,
    title={Scalable {B}ayesian Inverse Reinforcement Learning},
    author={Alex James Chan and Mihaela van der Schaar},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=4qR3coiNaIv}
}