Theoretical-GMRL

This repo is the official implementation for A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning (NeurIPS 2022). This code is developed based on the source code of TorchOpt and OpTree.

@inproceedings{liu2022a,
  title={A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning},
  author={Bo Liu and Xidong Feng and Jie Ren and Luo Mai and Rui Zhu and Haifeng Zhang and Jun Wang and Yaodong Yang},
  booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
  year={2022},
  url={https://openreview.net/forum?id=p9zeOtKQXKs}
}

Requirements

conda create --name GMRL python=3.7.11
pip install -r requirements.txt

Usages

Tabular MDP

Experiment on tabular MDP using MAML-RL and LIRPG.

Go into directory:

For MAML-RL,
```
cd tabular/maml
```
For LIRPG,
```
cd tabular/lirpg
```
Start training:

For MAML-RL meta-gradient decomposition and correlation ablation study,
```
sh run_decompose.sh

sh run_lr.sh 

sh run_step_ablation.sh 
```
For MAML-RL compositional bias ablation study,
```
sh run_comp_lr.sh

sh run_comp_sample.sh

sh run_comp_step_num.sh
```
For MAML-RL multi-step hessian bias ablation study,
```
sh run_hessian_bias.sh
```
For LIRPG ablation study,
```
sh run_step.sh
```

Iterated Prisoner's Dilemma (IPD)

Experiment on Iterated Prisoner's Dilemma (IPD) using LOLA-DiCE.

Go into directory:
```
cd lola
```

Start training:

For original LOLA-DICE,

python3 lola_dice_original.py --logdir ./results/inner_128_outer128_baseline

python3 lola_dice_original.py --inner_exact --logdir ./results/inner_exact_outer128_baseline

python3 lola_dice_original.py --inner_batch_size 1024 --logdir ./results/inner1024_outer128_baseline

python3 lola_dice_original.py --inner_exact --outer_exact --logdir ./results/inner_exact_outer_exact

For LOLA-DICE ablation study,

python3 lola_dice_ablation.py --hessian_batch_size 1024 --logdir ./result_ablation/comp_128_hessian_1024

python3 lola_dice_ablation.py --hessian_exact --logdir --logdir ./result_ablation/comp_128_hessian_exact

python3 lola_dice_ablation.py --comp_exact --logdir ./result_ablation/comp_exact_hessian_128

python3 lola_dice_ablation.py --comp_batch_size 1024 --logdir ./result_ablation/comp1024_hessian_128

For LOLA-DICE off-policy and ablation study,

python3 lola_dice_off_policy.py --logdir ./result_offpolicy/off_policy

python3 lola_dice_off_policy_ablation.py --comp_on_policy --logdir ./result_offpolicy/off_comp_on_hessian

python3 lola_dice_off_policy_ablation.py --hessian_on_policy --logdir ./result_offpolicy/on_comp_off_hessian

Atari Games

Experiment on eight atari games using MGRL.

Go into directory:
```
cd mgrl
```

Start training:

For running baseline A2C,

python3 main_baseline.py --env-name "QbertNoFrameskip-v4" --algo a2c --use-gae --log-interval 100 --num-steps 5 --num-processes 64 --lr 7e-4 --entropy-coef 0.01 --value-loss-coef 0.5 --gamma 0.99 --gae-lambda 0.95 --num-env-steps 40000000 --log-dir ./baseline/ --seed 0 --use-linear-lr-decay

For running 3-step MGRL,

python3 main_meta_condition_kl.py --env-name "QbertNoFrameskip-v4" --algo a2c_meta --use-gae --log-interval 100 --num-steps 5 --num-processes 64 --lr 7e-4 --entropy-coef 0.01 --value-loss-coef 0.5 --gamma 0.99 --gae-lambda 0.95 --num-env-steps 40000000 --log-dir ./meta/ --use-linear-lr-decay --comment all4_sigmoid_v --meta-lr 1e-3 --meta-update 3 --outer_kl_coef 0.0 --outer_entropy_coef 0.0 --outer_critic_coef 0.0 --seed 0

For running 3-step MGRL-LVC,

python3 main_meta_condition_kl.py --env-name "QbertNoFrameskip-v4" --algo a2c_meta --use-gae --log-interval 100 --num-steps 5 --num-processes 64 --lr 7e-4 --entropy-coef 0.01 --value-loss-coef 0.5 --gamma 0.99 --gae-lambda 0.95 --num-env-steps 40000000 --log-dir ./meta/ --use-linear-lr-decay --comment all4_sigmoid_v_lvc --meta-lr 1e-3 --meta-update 3 --outer_kl_coef 0.0 --outer_entropy_coef 0.0 --outer_critic_coef 0.0 --seed 0 --lvc

Acknowledgements

Licence

The MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lola		lola
mgrl		mgrl
tabular		tabular
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Theoretical-GMRL

Requirements

Usages

Tabular MDP

Iterated Prisoner's Dilemma (IPD)

Atari Games

Acknowledgements

Licence

About

Releases

Packages

Languages

License

Benjamin-eecs/Theoretical-GMRL

Folders and files

Latest commit

History

Repository files navigation

Theoretical-GMRL

Requirements

Usages

Tabular MDP

Iterated Prisoner's Dilemma (IPD)

Atari Games

Acknowledgements

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages