-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #74 from LucasAlegre/feature/hpo
Feature/hpo
- Loading branch information
Showing
13 changed files
with
481 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Hyperparameter optimization | ||
|
||
MORL-Baselines contains an early solution to the problem of hyperparameter optimization for MORL. | ||
The problem and solution are introduced and discussed in the following paper: | ||
[F. Felten, D. Gareev, E.-G. Talbi, and G. Danoy, “Hyperparameter Optimization for Multi-Objective Reinforcement Learning.” arXiv, Oct. 25, 2023. doi: 10.48550/arXiv.2310.16487.](https://arxiv.org/abs/2310.16487) | ||
|
||
|
||
A script to launch the hyperparameter sweep is available in [`benchmark/launch_experiment.py`](https://github.com/LucasAlegre/morl-baselines/experiments/hyperparameter_search/launch_sweep.py). | ||
|
||
An example usage of such script is the following: | ||
|
||
```bash | ||
python experiments/hyperparameter_search/launch_sweep.py \ | ||
--algo envelope \ | ||
--env-id minecart-v0 \ | ||
--ref-point 0 0 -200 \ | ||
--sweep-count 100 \ | ||
--seed 10 \ | ||
--num-seeds 3 \ | ||
--config-name envelope \ | ||
--train-hyperparams num_eval_weights_for_front:100 reset_num_timesteps:False eval_freq:10000 total_timesteps:10000 | ||
``` | ||
|
||
It will launch a HP search for Envelope Q-Leaning on minecart, using `[0, 0, -200]` as reference point for hypervolume. It will try 100 values of hyperparameters. The parameters distributions are specified in a yaml file specified by `config-name` (by default same name as the algorithm), it has to be in the directory. For each set of HP values, the algorithm will be trained on 3 different seeds, starting from 10 (so 10, 11, 12). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
method: bayes | ||
metric: | ||
goal: maximize | ||
name: avg_hypervolume | ||
parameters: | ||
learning_rate: | ||
distribution: uniform | ||
min: 0.0001 | ||
max: 0.001 | ||
initial_epsilon: | ||
distribution: uniform | ||
min: 0.01 | ||
max: 1 | ||
final_epsilon: | ||
distribution: uniform | ||
min: 0.01 | ||
max: 1 | ||
epsilon_decay_steps: | ||
distribution: int_uniform | ||
min: 1 | ||
max: 100000 | ||
tau: | ||
distribution: uniform | ||
min: 0.0 | ||
max: 1.0 | ||
target_net_update_freq: | ||
distribution: int_uniform | ||
min: 1 | ||
max: 10000 | ||
buffer_size: | ||
distribution: int_uniform | ||
min: 1000 | ||
max: 2000000 | ||
net_arch: | ||
value: [256, 256, 256, 256] | ||
batch_size: | ||
value: 32 | ||
learning_starts: | ||
distribution: int_uniform | ||
min: 1 | ||
max: 1000 | ||
gradient_updates: | ||
distribution: int_uniform | ||
min: 1 | ||
max: 10 | ||
gamma: | ||
value: 0.98 | ||
max_grad_norm: | ||
distribution: uniform | ||
min: 0.1 | ||
max: 10.0 | ||
num_sample_w: | ||
distribution: int_uniform | ||
min: 2 | ||
max: 10 | ||
per_alpha: | ||
distribution: uniform | ||
min: 0.1 | ||
max: 0.9 | ||
initial_homotopy_lambda: | ||
distribution: uniform | ||
min: 0.0 | ||
max: 1 | ||
final_homotopy_lambda: | ||
distribution: uniform | ||
min: 0.0 | ||
max: 1 | ||
homotopy_decay_steps: | ||
distribution: int_uniform | ||
min: 1 | ||
max: 100000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
method: bayes | ||
metric: | ||
goal: maximize | ||
name: avg_hypervolume | ||
parameters: | ||
num_envs: | ||
distribution: int_uniform | ||
min: 2 | ||
max: 8 | ||
pop_size: | ||
# distribution: int_uniform | ||
# min: 4 | ||
# max: 10 | ||
# Fix the value for now as delta weight = 1 / (popsize-1) | ||
value: 6 | ||
warmup_iterations: | ||
distribution: int_uniform | ||
min: 50 | ||
max: 100 | ||
steps_per_iteration: | ||
distribution: int_uniform | ||
min: 1000 | ||
max: 5000 | ||
evolutionary_iterations: | ||
distribution: int_uniform | ||
min: 10 | ||
max: 30 | ||
num_weight_candidates: | ||
distribution: int_uniform | ||
min: 5 | ||
max: 10 | ||
num_performance_buffer: | ||
distribution: int_uniform | ||
min: 50 | ||
max: 200 | ||
performance_buffer_size: | ||
distribution: int_uniform | ||
min: 1 | ||
max: 5 | ||
min_weight: | ||
value: 0.0 | ||
max_weight: | ||
value: 1.0 | ||
delta_weight: | ||
# distribution: uniform | ||
# min: 0.1 | ||
# max: 0.5 | ||
# Fix the value for now as delta weight = 1 / (popsize-1) | ||
value: 0.2 | ||
gamma: | ||
value: 0.995 | ||
num_minibatches: | ||
distribution: categorical | ||
values: [16, 32, 64] | ||
update_epochs: | ||
distribution: int_uniform | ||
min: 5 | ||
max: 20 | ||
learning_rate: | ||
distribution: uniform | ||
min: 0.0001 | ||
max: 0.01 | ||
anneal_lr: | ||
distribution: categorical | ||
values: [true, false] | ||
clip_coef: | ||
distribution: uniform | ||
min: 0.1 | ||
max: 1.0 | ||
ent_coef: | ||
distribution: uniform | ||
min: 0.0 | ||
max: 0.01 | ||
vf_coef: | ||
distribution: uniform | ||
min: 0.1 | ||
max: 1.0 | ||
clip_vloss: | ||
distribution: categorical | ||
values: [true, false] | ||
max_grad_norm: | ||
distribution: uniform | ||
min: 0.1 | ||
max: 1.0 | ||
norm_adv: | ||
distribution: categorical | ||
values: [true, false] | ||
gae: | ||
distribution: categorical | ||
values: [true, false] | ||
gae_lambda: | ||
distribution: uniform | ||
min: 0.9 | ||
max: 0.99 |
Oops, something went wrong.