All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed
- PyTorch 1.7 support (#1934)
LocalRunner
ignoresworker_cls
attribute of algorithms (#1984)mujoco_py
versions greater than v2.0.2.8 are incompatible with some GCC versions in conda (#2000)- MTSAC not learning because it corrupts the termination signal by wrapping with
GarageEnv
twice (#2029) - MTSAC does not respect
max_episode_length_eval
hyperparameter (#2029) - MTSAC MetaWorld examples do not use the correct number of tasks (#2029)
- MTSAC now supports a separate
max_episode_length
for evalaution via themax_episode_length_eval
hyperparameter (#2029) - MTSAC MetaWorld MT50 example used an incorrect
max_episode_length
(#2029)
- Fixed
- Better parameters for example
her_ddpg_fetchreach
(#1763) - Ensure determinism in TensorFlow by using
tfp.SeedStream
(#1821) - Broken rendering of MuJoCo environments to pixels in the NVIDIA Docker container (#1838)
- Enable cudnn in the NVIDIA Docker container (#1840)
- Bug in
DiscreteQfDerivedPolicy
in which parameters were not returned (#1847) - Populate
TimeLimit.truncated
at every step when usinggym.Env
(#1852) - Bug in which parameters where not copied when TensorFlow primitives are
clone()
ed (#1855) - Typo in the
Makefile
targetrun-nvidia
(#1914)
- Better parameters for example
- Fixed
- Algorithms
- PPO in PyTorch ( #905, #1188)
- TRPO in PyTorch ( #1018, #1053, #1186)
- MAML in PyTorch ( #1128, #1136, #1214, #1234, #1283)
- RL2 in TensorFlow ( #1127, #1175, #1189, #1190, #1195, #1231)
- PEARL in PyTorch ( #1059, #1124, #1218, #1316, #1374)
- SAC in PyTorch (#1235)
- MTSAC in PyTorch (#1332)
- Task Embeddings in TensorFlow ( #1168, #1169, #1167)
- Samplers
- New Sampler API, with efficient multi-env and multi-policy support ( #881, #1153, #1319)
garage.sampler.LocalSampler
, which uses the main process to sample ( #1133, #1156)- Reworked
garage.sampler.RaySampler
to use new API ( #1133, #1134) garage.sampler.MultiprocessingSampler
(#1298)garage.sampler.VecWorker
, a replacement for VecEnvExecutor (#1311)
- APIs
garage.TrajectoryBatch
data type ( #1058, #1065, #1132, #1154)garage.TimeStep
data type ( #1114, #1221)garage.TimeStepBatch
data type (#1529)garage.log_performance
( #1116, #1142, #1159)garage.np.algos.MetaRLAlgorithm
(#1142)garage.experiment.MetaEvaluator
( #1142, #1152, #1227)garage.log_multitask_performance
(#1192)garage.torch.distributions.TanhNormal
(#1140)garage.torch.policies.TanhGaussianMLPPolicy
(#1176)garage.experiment.wrap_experiment
to replacerun_experiment
with several new features ( #1100, #1155, #1160, #1164, #1249, #1258, #1281, #1396, #1482)garage.torch.q_functions.ContinuousCNNQFunction
(#1326)- PyTorch support for non-linearities with parameters (#928,
garage.torch.value_function.GaussianMLPValueFunction
( #1317)- Simpler PyTorch policy API (#1528)
garage.envs.TaskOnehotWrapper
(#1157)
- HalfCheetah meta environments ( #1108, #1131, #1216, #1385)
- PyTorch GPU support (#1182)
- PyTorch deterministic support (#1063)
- Support for running Meta-RL algorithms on MetaWorld benchmarks ( #1306)
- Examples for running MetaWorld benchmarks ( #1010, #1263, #1265, #1265, #1241, #1232, #1327, #1351, #1393)
- Improved off-policy evaluation ( #1139, #1279, #1331, #1384)
- Allow TensorFlow 2 (or TF >=1.14) ( #1309, #1563)
- Require Torch 1.4.0 ( #1335, #1361)
- Ensure TF and torch are optional (#1510)
- Update gym to 0.15.4 ( #1098, #1158)
- Rename
baseline
tovalue_function
(#1275) - Make
runner._sampler
optional (#1394) - Make ExplorationStrategies a type of Policy (#1397)
- Use
garage.replay_buffer.PathBuffer
in off-policy algos ( #1173, #1433) - Deprecated
run_experiment
( #1370, #1412) - Deprecated old-style samplers (#1369)
- Refactor TensorFlow to use tfp.distribution ( #1073, #1356, #1357, #1410, #1456, #1444, #1554, #1569)
- Set TotalEnvSteps as the default Tensorboard x-axis ( #1017, #1069)
- Update dependencies for docs (#1383)
- New optimizer_args TensorFlow interface (#1496)
- Move LocalTFRunner to garage.experiment (#1513)
- Implement HER using PathBuffer ( #1282 #1505)
- Change CNN API to use tuples for defining kernels (#1515)
- Many documentation improvements ( #1056, #1065, #1120, #1266, #1327, #1413, #1429, #1451, #1481, #1484)
- Eliminate use of "base" module name (#1403)
- Significant improvements to benchmarking ( #1271 #1291, #1306, #1307, #1310, #1320, #1368, #1380, #1409)
- Refactor benchmarks into a separate module ( #1395, #1402, #1400, #1411, #1408, #1416, #1414, #1432)
- Dependencies:
RLAlgorithm.get_itr_snapshot
(#1054)garage.misc.nb_utils
(#1288)garage.np.regressors
(#1493)garage.np.BatchPolopt
( #1486, #1492)garage.misc.prog_bar_counter
(#1495)garage.tf.envs.TfEnv
(#1443)garage.tf.BatchPolopt
(#1504)garage.np.OffPolicyRLAlgorithm
(#1552)
- Bug where
GymEnv
did not pickle (#1029) - Bug where
VecEnvExecutor
conflated terminal state and time limit signal ( #1178, #1570) - Bug where plotter window was opened multiple times (#1253)
- Bug where TF plotter used main policy on separate thread (#1270)
- Workaround gym timelimit and terminal state conflation (#1118)
- Bug where pixels weren't normalized correctly when using CNNs ( #1236, #1419)
- Bug where
garage.envs.PointEnv
did not step correctly (#1165) - Bug where sampler workers crashed in non-Deterministic mode (#1567)
- Use cloudpickle in old-style samplers to handle lambdas (#1371)
- Bug where workers where not shut down after running a resumed algorithm (#1293)
- Non-PyPI dependencies, which blocked using pipenv and poetry (#1247)
- Bug where TensorFlow paramter setting didn't work across differently named policies (#1355)
- Bug where advantages where computed incorrectly in PyTorch (#1197)
- Bug where TF plotter was used in LocalRunner (#1267)
- Worker processes are no longer started unnecessarily (#1006)
- All examples where fixed and are now tested (#1009)
- Better parameters for example
her_ddpg_fetchreach
(#1764) - Bug in
DiscreteQfDerivedPolicy
in which parameters were not returned (#1847) - Bug which made it impossible to evaluate stochastic policies deterministically (#1715)
- Use a GitHub Token in the CI to retrieve packages to avoid hitting GitHub API rate limit (#1250)
- Avoid installing dev extra dependencies during the conda check (#1296)
- Install
dm_control
from PyPI (#1406) - Pin tfp to 0.8.x to avoid breaking pipenv (#1480)
- Force python 3.5 in CI (#1522)
- Separate terminal and completion signal in vectorized sampler (#1581)
- Disable certicate check for roboti.us (#1595)
- Fix
advantages
shape incompute_advantage()
in torch tree (#1209) - Fix plotting using tf.plotter (#1292)
- Fix duplicate window rendering when using garage.Plotter (#1299)
- Fix setting garage.model parameters (#1363)
- Fix two example jupyter notebook (#1584)
- Fix collecting samples in
RaySampler
(#1583)
- Integration tests which cover all example scripts ( #1078, #1090)
- Deterministic mode support for PyTorch (#1068)
- Install script support for macOS 10.15.1 (#1051)
- PyTorch modules now support either functions or modules for specifying their non-linearities (#1038)
- Errors in the documentation on implementing new algorithms (#1074)
- Broken example for DDPG+HER in TensorFlow (#1070)
- Error in the documentation for using garage with conda (#1066)
- Broken pickling of environment wrappers (#1061)
garage.torch
was not included in the PyPI distribution (#1037)- A few broken examples for
garage.tf
(#1032)
- Algorithms
- APIs
- Environment wrappers for pixel-based algorithms, especially DQN (#556)
- Example for how to use garage with Google Colab (#476)
- Advantage normalization for recurrent policies in TF (#626)
- PyTorch support (#725, #764)
- Autogenerated API docs on garage.readthedocs.io (#802)
- GPU version of the pip package (#834)
- PathBuffer, a trajectory-oriented replay buffer (#838)
- RaySampler, a remote and/or multiprocess sampler based on ray (#793)
- Garage is now distributed on PyPI (#870)
rollout
option to only sample policies deterministically (#896)- MultiEnvWrapper, which wraps multiple
gym.Env
environments into a discrete multi-task environment (#946)
- Optimized Dockerfiles for fast rebuilds (#557)
- Random seed APIs moved to
garage.experiment.deterministic
(#578) - Experiment wrapper script is now an ordinary module (#586)
- numpy-based modules and algorithms moved to
garage.np
(#604) - Algorithm constructors now use
EnvSpec
rather thangym.Env
(#575) - Snapshotter API moved from
garage.logger
togarage.experiment
(#658) - Moved
process_samples
API from the Sampler to algorithms (#652) - Updated Snapshotter API (#699)
- Updated Resume API (#777)
- All algorithms now have a default sampler (#832)
- Experiment lauchers now require an explicit
snapshot_config
to theirrun_task
function (#860) - Various samplers moved from
garage.tf.sampler
togarage.sampler
(#836, #840) - Dockerfiles are now based on Ubuntu 18.04 LTS by default (#763)
dm_control
is now an optional dependency, installed using the extragarage[dm_control]
(#828)- MuJoCo is now an optional dependency, installed using the extra
garage[mujoco]
(#848) - Samplers no longer flatten observations and actions (#930, #938, #967)
- Implementations, tests, and benchmarks for all TensorFlow primitives, which
are now based on
garage.tf.Model
(#574, #606, #615, #616, #618, #641, #642, #656, #662, #668, #672, #677, #730, #722, #765, #855, #878, #888, #898, #892, #897, #893, #890, #903, #916, #891, #922, #931, #933, #906, #945, #944, #943, #972) - Dependency upgrades:
garage.misc.autoargs
, a tool for decorating classes with autogenerated command-line arguments (#573)garage.misc.ext
, a module with several unrelated utilities (#578)config_personal.py
module, replaced by environment variables where relevant (#578, #747)contrib.rllab_hyperopt
, an experimental module for usinghyperopt
to tune hyperparameters (#684)contrib.bichenchao
, a module of example launchers (#683)contrib.alexbeloi
, a module with an importance-sampling sampler and examples (there were merged into garage) (#717)- EC2 cluster documentation and examples (#835)
DeterministicMLPPolicy
, because it duplicatedContinuousMLPPolicy
(#929)garage.tf.layers
, a custom high-level neural network definition API, was replaced bygarage.tf.models
(#939)Parameterized
, which was replaced bygarage.tf.Model
(#942)garage.misc.overrides
, whose features are no longer needed due proper ABC support in Python 3 and sphinx-autodoc (#974)Serializable
, which became a maintainability burden and has now been replaced by regular pickle protocol (__getstate__
/__setstate__
) implementations, where necessary (#982)garage.misc.special
, a library of mostly-unused math subroutines (#986)garage.envs.util
, superceded by features in akro (#986)garage.misc.console
, a library of mostly-unused helper functions for writing shell scripts (#988)
- Bug in
ReplayBuffer
#554 - Bug in
setup_linux.sh
#560 - Bug in
examples/sim_policy.py
(#691) - Bug in
FiniteDifferenceHvp
(#745) - Determinism bug for some samplers (#880)
use_gpu
in the experiment runner (#918)
- Bug in entropy regularization in TensorFlow PPO/TRPO (#579)
- Bug in which advantage normalization was broken for recurrent policies (#626)
- Bug in
examples/sim_policy.py
(#691) - Bug in
FiniteDifferenceHvp
(#745)
- Fix overhead in GaussianMLPRegressor by optionally creating assign operations (#622)
- Epsilon-greedy exploration strategy, DiscreteMLPModel, and QFunctionDerivedPolicy (all needed by DQN)
- Base Model class for TensorFlow-based primitives
- Dump plots generated with matplotlib to TensorBoard
- Relative Entropy Policy Search (REPS) algorithm
- GaussianConvBaseline and GaussianConvRegressor primitives
- New Dockerfiles, docker-compose files, and Makefiles for running garage using Docker
- Vanilla policy gradient loss to NPO
- Truncated Natural Policy Gradient (TNPG) algorithm for TensorFlow
- Episodic Reward Weighted Regression (ERWR) algorithm for TensorFlow
- gym.Env wrappers used for pixel environments
- Convolutional Neural Network primitive
- Move dependencies from environment.yml to setup.py
- Update dependencies:
- tensorflow-probability to 0.5.x
- dm_control to commit 92f9913
- TensorFlow to 1.12
- MuJoCo to 2.0
- gym to 0.10.11
- Move dm_control tests into the unit test tree
- Use GitHub standard .gitignore
- Improve the implementation of RandomizedEnv (Dynamics Randomization)
- Decouple TensorBoard from the logger
- Move files from garage/misc/instrument to garage/experiment
- setup.py to be canonical in format and use automatic versioning
- Move some garage subpackages into their own repositories:
- garage.viskit to rlworkgroup/viskit
- garage.spaces to rlworkgroup/akro
- Remove Theano backend, algorithms, and dependencies
- Custom environments which duplicated openai/gym
- Some dead files from garage/misc (meta.py and viewer2d.py)
- Remove all code coverage tracking providers except CodeCov
- Clean up warnings in the test suite
- Pickling bug in GaussianMLPolicyWithModel
- Namescope in LbfgsOptimizer
- Correctly sample paths in OffPolicyVectorizedSampler
- Implementation bugs in tf/VPG
- Bug when importing Box
- Bug in test_benchmark_her
- Avoid importing Theano when using the TensorFlow branch
- Avoid importing MuJoCo when not required
- Implementation bugs in tf/VPG
- Bug when importing Box
- Bug in test_benchmark_her
- Bug in the CI scripts which produced false positives
- PPO and DDPG for the TensorFlow branch
- HER for DDPG
- Recurrent Neural Network policy support for NPO, PPO and TRPO
- Base class for ReplayBuffer, and two implementations: SimpleReplayBuffer and HerReplayBuffer
- Sampler classes OffPolicyVectorizedSampler and OnPolicyVectorizedSampler
- Base class for offline policies OffPolicyRLAlgorithm
- Benchmark tests for TRPO, PPO and DDPG to compare their performance with those produced by OpenAI Baselines
- Dynamics randomization for MuJoCo environments
- Support for dm_control environments
- DictSpace support for garage environments
- PEP8 checks enforced in the codebase
- Support for Python imports: maintain correct ordering and remove unused imports or import errors
- Test on TravisCI using Docker images for managing dependencies
- Testing code reorganized
- Code Coverage measurement with codecov
- Pre-commit hooks to enforce PEP8 and to verify imports and commit messages, which are also applied in the Travis CI verification
- Docstring verification for added files that are not in the test branch or moved files
- TensorBoard support for all key-value/log_tabular calls, plus support for logging distributions
- Variable and name scope for symbolic operations in TensorFlow
- Top-level base Space class for garage
- Asynchronous plotting for Theano and Tensorflow
- GPU support for Theano
- Rename rllab to garage, including all the rllab references in the packages and modules inside the project
- Rename run_experiment_lite to run_experiment
- The file cma_es_lib.py was replaced by the pycma library available on PyPI
- Move the contrib package to garage.contrib
- Move Theano-dependent code to garage.theano
- Move all code from sandbox.rocky.tf to garage.tf
- Update several dependencies, mainly:
- Python to 3.6.6
- TensorFlow to 1.9
- Theano to 1.0.2
- mujoco-py to 1.50.1
- gym to 0.10.8
- Transfer various dependencies from conda to pip
- Separate example script files in the Theano and TensorFlow branch
- Update LICENSE, CONTRIBUTING.md and .gitignore
- Use convenience imports, that is, import classes and functions that share the
same or similar name to its module in the corresponding
__init__.py
file of their package - Replace ProxyEnv with gym.Wrapper
- Update installation scripts for Linux and macOS
- All unused imports in the Python files
- Unused packages from environment.yml
- The files under rllab.mujoco_py were removed to use the pip release instead
- Empty
__init__.py
files - The environment class defined by rllab.envs.Env was not imported to garage and the environment defined by gym.Env is used now
- Sleeping processes produced by the parallel sampler. NOTE: although the frequency of this issue has been reduced, our tests in TravisCI occasionally detect the issue and currently it seems to be an issue with re-entrant locks and multiprocessing in Python.