Skip to content

Latest commit

 

History

History
573 lines (533 loc) · 35.4 KB

CHANGELOG.md

File metadata and controls

573 lines (533 loc) · 35.4 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2020.06.3

  • Fixed
    • PyTorch 1.7 support (#1934)
    • LocalRunner ignores worker_cls attribute of algorithms (#1984)
    • mujoco_py versions greater than v2.0.2.8 are incompatible with some GCC versions in conda (#2000)
    • MTSAC not learning because it corrupts the termination signal by wrapping with GarageEnv twice (#2029)
    • MTSAC does not respect max_episode_length_eval hyperparameter (#2029)
    • MTSAC MetaWorld examples do not use the correct number of tasks (#2029)
    • MTSAC now supports a separate max_episode_length for evalaution via the max_episode_length_eval hyperparameter (#2029)
    • MTSAC MetaWorld MT50 example used an incorrect max_episode_length (#2029)

2020.06.2

  • Fixed
    • Better parameters for example her_ddpg_fetchreach (#1763)
    • Ensure determinism in TensorFlow by using tfp.SeedStream (#1821)
    • Broken rendering of MuJoCo environments to pixels in the NVIDIA Docker container (#1838)
    • Enable cudnn in the NVIDIA Docker container (#1840)
    • Bug in DiscreteQfDerivedPolicy in which parameters were not returned (#1847)
    • Populate TimeLimit.truncated at every step when using gym.Env (#1852)
    • Bug in which parameters where not copied when TensorFlow primitives are clone()ed (#1855)
    • Typo in the Makefile target run-nvidia (#1914)

2020.06.1

  • Fixed
    • Pipenv fails to resolve a stable dependency set because of excessively-narrow dependencies in tensorflow-probability (#1721)
    • Bug which prevented rollout from running policies deterministically (#1714)

2020.06.0

Added

Changed

Removed

  • Dependencies:
    • matplotlib (moved to dev) (#1083)
    • atari-py (#1194)
    • gtimer, pandas, rlkit, seaborn (moved to benchmarks) (#1325)
    • pyprind (#1495)
  • RLAlgorithm.get_itr_snapshot (#1054)
  • garage.misc.nb_utils (#1288)
  • garage.np.regressors (#1493)
  • garage.np.BatchPolopt ( #1486, #1492)
  • garage.misc.prog_bar_counter (#1495)
  • garage.tf.envs.TfEnv (#1443)
  • garage.tf.BatchPolopt (#1504)
  • garage.np.OffPolicyRLAlgorithm (#1552)

Fixed

  • Bug where GymEnv did not pickle (#1029)
  • Bug where VecEnvExecutor conflated terminal state and time limit signal ( #1178, #1570)
  • Bug where plotter window was opened multiple times (#1253)
  • Bug where TF plotter used main policy on separate thread (#1270)
  • Workaround gym timelimit and terminal state conflation (#1118)
  • Bug where pixels weren't normalized correctly when using CNNs ( #1236, #1419)
  • Bug where garage.envs.PointEnv did not step correctly (#1165)
  • Bug where sampler workers crashed in non-Deterministic mode (#1567)
  • Use cloudpickle in old-style samplers to handle lambdas (#1371)
  • Bug where workers where not shut down after running a resumed algorithm (#1293)
  • Non-PyPI dependencies, which blocked using pipenv and poetry (#1247)
  • Bug where TensorFlow paramter setting didn't work across differently named policies (#1355)
  • Bug where advantages where computed incorrectly in PyTorch (#1197)
  • Bug where TF plotter was used in LocalRunner (#1267)
  • Worker processes are no longer started unnecessarily (#1006)
  • All examples where fixed and are now tested (#1009)

2019.10.3

Fixed

  • Better parameters for example her_ddpg_fetchreach (#1764)
  • Bug in DiscreteQfDerivedPolicy in which parameters were not returned (#1847)
  • Bug which made it impossible to evaluate stochastic policies deterministically (#1715)

2019.10.2

Fixed

  • Use a GitHub Token in the CI to retrieve packages to avoid hitting GitHub API rate limit (#1250)
  • Avoid installing dev extra dependencies during the conda check (#1296)
  • Install dm_control from PyPI (#1406)
  • Pin tfp to 0.8.x to avoid breaking pipenv (#1480)
  • Force python 3.5 in CI (#1522)
  • Separate terminal and completion signal in vectorized sampler (#1581)
  • Disable certicate check for roboti.us (#1595)
  • Fix advantages shape in compute_advantage() in torch tree (#1209)
  • Fix plotting using tf.plotter (#1292)
  • Fix duplicate window rendering when using garage.Plotter (#1299)
  • Fix setting garage.model parameters (#1363)
  • Fix two example jupyter notebook (#1584)
  • Fix collecting samples in RaySampler (#1583)

2019.10.1

Added

  • Integration tests which cover all example scripts ( #1078, #1090)
  • Deterministic mode support for PyTorch (#1068)
  • Install script support for macOS 10.15.1 (#1051)
  • PyTorch modules now support either functions or modules for specifying their non-linearities (#1038)

Fixed

  • Errors in the documentation on implementing new algorithms (#1074)
  • Broken example for DDPG+HER in TensorFlow (#1070)
  • Error in the documentation for using garage with conda (#1066)
  • Broken pickling of environment wrappers (#1061)
  • garage.torch was not included in the PyPI distribution (#1037)
  • A few broken examples for garage.tf (#1032)

2019.10.0

Added

  • Algorithms
    • (D)DQN in TensorFlow (#582)
    • Maximum-entropy and entropy regularization for policy gradient algorithms in TensorFlow (#632)
    • DDPG in PyTorch (#815)
    • VPG (i.e. policy gradients) in PyTorch (#883)
    • TD3 in TensorFlow (#458)
  • APIs
    • Runner API for executing experiments and LocalRunner implementation for executing them on the local machine ( #541, #593, #602, #816, )
    • New Logger API, provided by a sister project dowel (#464, #660)
  • Environment wrappers for pixel-based algorithms, especially DQN (#556)
  • Example for how to use garage with Google Colab (#476)
  • Advantage normalization for recurrent policies in TF (#626)
  • PyTorch support (#725, #764)
  • Autogenerated API docs on garage.readthedocs.io (#802)
  • GPU version of the pip package (#834)
  • PathBuffer, a trajectory-oriented replay buffer (#838)
  • RaySampler, a remote and/or multiprocess sampler based on ray (#793)
  • Garage is now distributed on PyPI (#870)
  • rollout option to only sample policies deterministically (#896)
  • MultiEnvWrapper, which wraps multiple gym.Env environments into a discrete multi-task environment (#946)

Changed

  • Optimized Dockerfiles for fast rebuilds (#557)
  • Random seed APIs moved to garage.experiment.deterministic (#578)
  • Experiment wrapper script is now an ordinary module (#586)
  • numpy-based modules and algorithms moved to garage.np (#604)
  • Algorithm constructors now use EnvSpec rather than gym.Env (#575)
  • Snapshotter API moved from garage.logger to garage.experiment (#658)
  • Moved process_samples API from the Sampler to algorithms (#652)
  • Updated Snapshotter API (#699)
  • Updated Resume API (#777)
  • All algorithms now have a default sampler (#832)
  • Experiment lauchers now require an explicit snapshot_config to their run_task function (#860)
  • Various samplers moved from garage.tf.sampler to garage.sampler (#836, #840)
  • Dockerfiles are now based on Ubuntu 18.04 LTS by default (#763)
  • dm_control is now an optional dependency, installed using the extra garage[dm_control] (#828)
  • MuJoCo is now an optional dependency, installed using the extra garage[mujoco] (#848)
  • Samplers no longer flatten observations and actions (#930, #938, #967)
  • Implementations, tests, and benchmarks for all TensorFlow primitives, which are now based on garage.tf.Model (#574, #606, #615, #616, #618, #641, #642, #656, #662, #668, #672, #677, #730, #722, #765, #855, #878, #888, #898, #892, #897, #893, #890, #903, #916, #891, #922, #931, #933, #906, #945, #944, #943, #972)
  • Dependency upgrades:
    • mujoco-py to 2.0 (#661)
    • gym to 0.12.4 (#661)
    • dm_control to 7a36377879c57777e5d5b4da5aae2cd2a29b607a (#661)
    • akro to 0.0.6 (#796)
    • pycma to 2.7.0 (#861)
    • tensorflow to 1.15 (#953)
    • pytorch to 1.3.0 (#952)

Removed

  • garage.misc.autoargs, a tool for decorating classes with autogenerated command-line arguments (#573)
  • garage.misc.ext, a module with several unrelated utilities (#578)
  • config_personal.py module, replaced by environment variables where relevant (#578, #747)
  • contrib.rllab_hyperopt, an experimental module for using hyperopt to tune hyperparameters (#684)
  • contrib.bichenchao, a module of example launchers (#683)
  • contrib.alexbeloi, a module with an importance-sampling sampler and examples (there were merged into garage) (#717)
  • EC2 cluster documentation and examples (#835)
  • DeterministicMLPPolicy, because it duplicated ContinuousMLPPolicy (#929)
  • garage.tf.layers, a custom high-level neural network definition API, was replaced by garage.tf.models (#939)
  • Parameterized, which was replaced by garage.tf.Model (#942)
  • garage.misc.overrides, whose features are no longer needed due proper ABC support in Python 3 and sphinx-autodoc (#974)
  • Serializable, which became a maintainability burden and has now been replaced by regular pickle protocol (__getstate__/__setstate__) implementations, where necessary (#982)
  • garage.misc.special, a library of mostly-unused math subroutines (#986)
  • garage.envs.util, superceded by features in akro (#986)
  • garage.misc.console, a library of mostly-unused helper functions for writing shell scripts (#988)

Fixed

  • Bug in ReplayBuffer #554
  • Bug in setup_linux.sh #560
  • Bug in examples/sim_policy.py (#691)
  • Bug in FiniteDifferenceHvp (#745)
  • Determinism bug for some samplers (#880)
  • use_gpu in the experiment runner (#918)

Fixed

  • Bug in entropy regularization in TensorFlow PPO/TRPO (#579)
  • Bug in which advantage normalization was broken for recurrent policies (#626)
  • Bug in examples/sim_policy.py (#691)
  • Bug in FiniteDifferenceHvp (#745)

Fixed

  • Fix overhead in GaussianMLPRegressor by optionally creating assign operations (#622)

Added

  • Epsilon-greedy exploration strategy, DiscreteMLPModel, and QFunctionDerivedPolicy (all needed by DQN)
  • Base Model class for TensorFlow-based primitives
  • Dump plots generated with matplotlib to TensorBoard
  • Relative Entropy Policy Search (REPS) algorithm
  • GaussianConvBaseline and GaussianConvRegressor primitives
  • New Dockerfiles, docker-compose files, and Makefiles for running garage using Docker
  • Vanilla policy gradient loss to NPO
  • Truncated Natural Policy Gradient (TNPG) algorithm for TensorFlow
  • Episodic Reward Weighted Regression (ERWR) algorithm for TensorFlow
  • gym.Env wrappers used for pixel environments
  • Convolutional Neural Network primitive

Changed

  • Move dependencies from environment.yml to setup.py
  • Update dependencies:
    • tensorflow-probability to 0.5.x
    • dm_control to commit 92f9913
    • TensorFlow to 1.12
    • MuJoCo to 2.0
    • gym to 0.10.11
  • Move dm_control tests into the unit test tree
  • Use GitHub standard .gitignore
  • Improve the implementation of RandomizedEnv (Dynamics Randomization)
  • Decouple TensorBoard from the logger
  • Move files from garage/misc/instrument to garage/experiment
  • setup.py to be canonical in format and use automatic versioning

Removed

  • Move some garage subpackages into their own repositories:
  • Remove Theano backend, algorithms, and dependencies
  • Custom environments which duplicated openai/gym
  • Some dead files from garage/misc (meta.py and viewer2d.py)
  • Remove all code coverage tracking providers except CodeCov

Fixed

  • Clean up warnings in the test suite
  • Pickling bug in GaussianMLPolicyWithModel
  • Namescope in LbfgsOptimizer
  • Correctly sample paths in OffPolicyVectorizedSampler
  • Implementation bugs in tf/VPG
  • Bug when importing Box
  • Bug in test_benchmark_her

Fixed

  • Avoid importing Theano when using the TensorFlow branch
  • Avoid importing MuJoCo when not required
  • Implementation bugs in tf/VPG
  • Bug when importing Box
  • Bug in test_benchmark_her
  • Bug in the CI scripts which produced false positives

Added

  • PPO and DDPG for the TensorFlow branch
  • HER for DDPG
  • Recurrent Neural Network policy support for NPO, PPO and TRPO
  • Base class for ReplayBuffer, and two implementations: SimpleReplayBuffer and HerReplayBuffer
  • Sampler classes OffPolicyVectorizedSampler and OnPolicyVectorizedSampler
  • Base class for offline policies OffPolicyRLAlgorithm
  • Benchmark tests for TRPO, PPO and DDPG to compare their performance with those produced by OpenAI Baselines
  • Dynamics randomization for MuJoCo environments
  • Support for dm_control environments
  • DictSpace support for garage environments
  • PEP8 checks enforced in the codebase
  • Support for Python imports: maintain correct ordering and remove unused imports or import errors
  • Test on TravisCI using Docker images for managing dependencies
  • Testing code reorganized
  • Code Coverage measurement with codecov
  • Pre-commit hooks to enforce PEP8 and to verify imports and commit messages, which are also applied in the Travis CI verification
  • Docstring verification for added files that are not in the test branch or moved files
  • TensorBoard support for all key-value/log_tabular calls, plus support for logging distributions
  • Variable and name scope for symbolic operations in TensorFlow
  • Top-level base Space class for garage
  • Asynchronous plotting for Theano and Tensorflow
  • GPU support for Theano

Changed

  • Rename rllab to garage, including all the rllab references in the packages and modules inside the project
  • Rename run_experiment_lite to run_experiment
  • The file cma_es_lib.py was replaced by the pycma library available on PyPI
  • Move the contrib package to garage.contrib
  • Move Theano-dependent code to garage.theano
  • Move all code from sandbox.rocky.tf to garage.tf
  • Update several dependencies, mainly:
    • Python to 3.6.6
    • TensorFlow to 1.9
    • Theano to 1.0.2
    • mujoco-py to 1.50.1
    • gym to 0.10.8
  • Transfer various dependencies from conda to pip
  • Separate example script files in the Theano and TensorFlow branch
  • Update LICENSE, CONTRIBUTING.md and .gitignore
  • Use convenience imports, that is, import classes and functions that share the same or similar name to its module in the corresponding __init__.py file of their package
  • Replace ProxyEnv with gym.Wrapper
  • Update installation scripts for Linux and macOS

Removed

  • All unused imports in the Python files
  • Unused packages from environment.yml
  • The files under rllab.mujoco_py were removed to use the pip release instead
  • Empty __init__.py files
  • The environment class defined by rllab.envs.Env was not imported to garage and the environment defined by gym.Env is used now

Fixed

  • Sleeping processes produced by the parallel sampler. NOTE: although the frequency of this issue has been reduced, our tests in TravisCI occasionally detect the issue and currently it seems to be an issue with re-entrant locks and multiprocessing in Python.