nnabla-rl offers various (deep) reinforcement learning and optimal control algorithms. See the list below for the implemented algorithms!
- Online training: Training which is performed by interacting with the environment. You'll need to prepare an environment which is compatible with the OpenAI gym's environment interface.
- Offline(Batch) training: Training which is performed sorely from provided data. You'll need to prepare a dataset capsuled with the ReplayBuffer.
- Continuous/Discrete action: If you are familiar with the training of deep neural nets, the action type's difference is similar to the difference of regression and classification. Continuous action is an action which consists of real value(s) (e.g. robot's motor torque). In contrast, discrete action is an action which can be labeled (e.g. UP, DOWN, RIGHT, LEFT). The choice of action type depends on the environment (problem) and applicable algorithm changes depending on the its action type.
- Hybrid action: Hybrid action is an environment that requires both discrete and continuous action in pairs.
- RNN layer support: Supports training of network models with recurrent layers.
Algorithm | Online training | Offline(Batch) training | Continuous action | Discrete action | Hybrid action | RNN layer support |
---|---|---|---|---|---|---|
A2C | ✔️ | ❌ | (We will support continuous action in the future) | ✔️ | ❌ | ❌ |
AMP | ✔️ | ❌ | ✔️ | ❌ | ❌ | ❌ |
ATRPO | ✔️ | ❌ | ✔️ | (We will support discrete action in the future) | ❌ | ❌ |
BCQ | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ |
BEAR | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ |
Categorical DDQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
Categorical DQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
DDPG | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
DDQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
DecisionTransformer | ❌ | ✔️ | ✔️ | ✔️ | ❌ | ❌ |
DEMME-SAC | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
DQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
DRQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
GAIL | ✔️ | ❌ | ✔️ | (We will support discrete action in the future) | ❌ | ❌ |
HER | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
HyAR | ✔️ | ❌ | ❌ | ❌ | ✔️ | ❌ |
IQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️* |
MME-SAC | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
M-DQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
M-IQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
PPO | ✔️ | ❌ | ✔️ | ✔️ | ❌ | ❌ |
QRSAC | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
QRDQN | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ❌ |
QtOpt (ICRA 2018 version) | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
Rainbow | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ✔️ |
REDQ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
REINFORCE | ✔️ | ❌ | ✔️ | ✔️ | ❌ | ❌ |
SAC | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
SAC (ICML 2018 version) | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
SAC-D | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
SRSAC | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
TD3 | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
TRPO | ✔️ | ❌ | ✔️ | (We will support discrete action in the future) | ❌ | ❌ |
TRPO (ICML 2015 version) | ✔️ | ❌ | ✔️ | ✔️ | ❌ | ❌ |
XQL | ❌ | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
*May require special treatment to train with RNN layers.
- Need training: Most of the optimal control algorithm does NOT require training to run the controller. Instead, you will need the dynamics model of the system and cost function of the task in prior to the execution of the algorithm. See the documentation of each algorithm for the detail.
- Continuous/Discrete action: Same as reinfocement learning. However, most of the optimal control algorithm does not support discrete action.
Algorithm | Need training | Continuous action | Discrete action |
---|---|---|---|
DDP | not required | ✔️ | ❌ |
iLQR | not required | ✔️ | ❌ |
LQR | not required | ✔️ | ❌ |
MPPI | may train a dynamics model | ✔️ | ❌ |