jax-rl-template

A minimal JAX-based reinforcement learning template, for rapidly spinning up RL projects!

All training and evaluation is JIT-compiled end-to-end in JAX. The template is for Python 3.8.12, built on top of:

JAX - Autograd and XLA
Flax - Neural network library
Optax - Gradient-based optimisation
Distrax - Probability distributions
Weights & Biases - Experiment tracking and visualisation

Features

Variants of this template are released as branches of this repository, each with different features:

Branch	Description	Agents	Environments
`main` (here)	Basic training and evaluation functionality (e.g. training loop, logging, checkpointing), plus common online RL agents	`PPO`, `SAC`, `DQN`	`Gymnax`
`offline` (TBC)	Adds offline RL functionality (e.g. replay buffer, offline training)	`CQL`, `EDAC`	-

This template is designed to provide only core functionality, providing a solid foundation for RL projects. Whilst it is not designed to be a full-featured RL library, please raise an issue if you think a feature is missing that would be useful for many projects.

Setup

Running locally (CPU)

Install Python packages from requirements-base.txt and requirements-cpu.txt in setup with:

cd setup && pip install $(cat requirements-base.txt requirements-cpu.txt)

Sign into WandB to enable logging:

wandb login

Running via Docker

Build the Docker container with the provided script:

cd setup/docker && ./build.sh

Add your WandB key to the setup/docker folder:

echo <wandb_key> > setup/docker/wandb_key

Automatic code formatting

Install the Black pre-commit hook, after installing Python packages, with:

pre-commit install

This will check and fix formatting errors when you commit code.

Usage

Training locally

To train an agent, run:

python train.py <arguments>

For example, to train a PPO agent on the CartPole-v1 environment and log to WandB, run:

python train.py --agent ppo --env_name CartPole-v1 --log --wandb_entity wandb_username --wandb_project project_name

To see all possible arguments, see experiments/parse_args.py or run:

python train.py --help

Training via Docker

Launch training runs inside your built container with:

./run_docker.sh <gpu_id> python3 train.py <arguments>

For example, to train a DQN agent on the Asterix-MinAtar environment using GPU 3, run:

./run_docker.sh 3 python3 train.py --agent dqn --env_name Asterix-MinAtar

Acknowledgements

Large parts of the training loop and PPO implementation are based on PureJaxRL, which contains high-performance, single-file implementations of RL agents in JAX.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
agents		agents
environments		environments
experiments		experiments
models		models
setup		setup
util		util
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
run_docker.sh		run_docker.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jax-rl-template

Features

Setup

Running locally (CPU)

Running via Docker

Automatic code formatting

Usage

Training locally

Training via Docker

Acknowledgements

About

Releases

Packages

Languages

EmptyJackson/jax-rl-template

Folders and files

Latest commit

History

Repository files navigation

jax-rl-template

Features

Setup

Running locally (CPU)

Running via Docker

Automatic code formatting

Usage

Training locally

Training via Docker

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages