Release v0.3.0: Data hub, universal env converter and more! · pytorch/rl

In this release, we focused on building a Data Hub for offline RL, providing a universal 2gym conversion tool (#1795) and improving the doc.

TorchRL Data Hub

TorchRL now offers many offline datasets in robotics and control or gaming, all under a single data format (TED for TorchRL Episode Data Format). All datasets are one step away of being downloaded: dataset = <Name>ExperienceReplay(dataset_id, root="/path/to/storage", download=True) is all you need to get started.
This means that you can now download OpenX #1751 or Roboset #1743 datasets and combine them in a single replay buffer #1768 or swap one another in no time and with no extra code.
We allow many new sampling techniques, like sampling slices of trajectories with or without repetition etc.
As always you can append your favourite transform to these transforms.

TorchRL2Gym universal converter

#1795 introduces a new universal converter for simulation libraries to gym.
As RL practitioner, it's sometimes difficult to accommodate for the many different environment APIs that exist. TorchRL now provides a way of registering any env in gym(nasium). This allows users to build their dataset in torchrl and integrate them in their code base with no effort if they are already using gym as a backend. It also allows to transform DMControl or Brax envs (among others) to gym without the need for an extra library.

PPO and A2C compatibility with distributed models

Functional calls can now be turned off for PPO and A2C loss modules, allowing users to run RLHF training loops at scale! #1804

## TensorDict-free replay buffers

You can now use TorchRL's replay buffer with ANY tensor-based structure, whether it involves dict, tuples or lists. In principle, storing data contiguously on disk given any gym environment is as simple as

rb = ReplayBuffer(storage=LazyMemmapStorage(capacity))
obs_, reward, terminal, truncated, info = env.step(action)
rb.add((obs, obs_, reward, terminal, truncated, info, action))

# sampling a tuple obs, reward, terminal, truncated, info
obs, obs_, reward, terminal, truncated, info = rb.sample()

This is independent of TensorDict and it supports many components of our replay buffers as well as transforms. Check the doc here.

## Multiprocessed replay buffers

TorchRL's replay buffers can now be shared across processes. Multiprocessed RBs can not only be read from but also extended on different workers. #1724

SOTA checks

We introduce a list of scripts to check that our training scripts work ok before each release: #1822

Throughput of Gym and DMControl

We removed loads of checks in GymLikeEnv if some basic conditions are met, which improves the throughput significantly for simple envs. #1803

## Algorithms

We introduce discrete CQL #1666 , discrete IQL #1793 and Impala #1506.

What's Changed: PR description

[BugFix] Fix incorrect deprecation warning by @mikemykhaylov in #1655
[Bug] TensorDictMaxValueWriter raises error when no sample in a batch is accepted by @albertbou92 in #1664
[BugFix] Fix "done" instead of "terminated" mistakes by @MarCnu in #1661
[Feature] CatFrames constant padding by @albertbou92 in #1663
doc(README): remove typo by @Deep145757 in #1665
[Docs] Update README.md by @vaibhav-009 in #1667
[Minor] Update dreamer example tests by @vmoens in #1668
[Feature] Introduce grouping in VMAS by @matteobettini in #1658
[BugFix] assertion error message, envs/util.py by @laszloKopits in #1669
[Doc] Set action_spec instead of input_spec by @FrankTianTT in #1657
[BugFix] Fix submitit IP address/node name retrieval by @vmoens in #1672
[Doc] Document (and test) compound actor by @vmoens in #1673
[Doc] Update rollout_recurrent.png to account for terminal by @vmoens in #1677
[Doc] Add EGreedyWrapper back in the doc by @vmoens in #1680
[Doc] Fix TanhDelta docstring by @matteobettini in #1683
[Doc] Add discord badge on README by @vmoens in #1686
[CI] Downgrade RAY to fix CI by @vmoens in #1687
[BugFix] MaxValueWriter cuda compatibility by @albertbou92 in #1689
Upload docs for preview on HUD by @DanilBaibak in #1682
[Doc] Update pendulum and rnn tutos by @vmoens in #1691
[Algorithm] Discrete CQL by @BY571 in #1666
[BugFix] Minor fix in the logging of PPO and A2C examples by @albertbou92 in #1693
[CI] Enable retry mechanism by @DanilBaibak in #1681
[Refactor] Minor changes in prep of pytorch/tensordict#541 by @vmoens in #1696
[BugFix] fix dreamer actor by @FrankTianTT in #1697
[Refactor] Deprecate direct usage of memmap tensors by @vmoens in #1684
Revert "[Refactor] Deprecate direct usage of memmap tensors" by @vmoens in #1698
[Refactor] Deprecate direct usage of memmap tensors by @vmoens in #1699
[Doc] Fix discord link by @vmoens in #1701
[BugFix] make sure the params of exploration-wrapper is float by @FrankTianTT in #1700
[Fix] EndOfLifeTransform fix in end of life detection by @albertbou92 in #1705
[CI] Fix benchmark on gpu by @vmoens in #1706
[Algorithm] IMPALA and VTrace module by @albertbou92 in #1506
[Doc] Fix discord link by @vmoens in #1712
[Refactor] Refactor functional calls in losses by @vmoens in #1707
[CI] Fix CI by @vmoens in #1711
[BugFix] Make casting to 'meta' device uniform across cost modules by @vmoens in #1715
[BugFix] Change ppo mujoco example to match paper results by @albertbou92 in #1714
[Minor] Hide params in ddpg actor-critic by @vmoens in #1716
[BugFix] Fix hold_out_net by @vmoens in #1719
[BugFix] RewardSum key check by @matteobettini in #1718
[Feature] Allow usage of a different device on main and sub-envs in ParallelEnv and SerialEnv by @vmoens in #1626
[Refactor] Better weight update in collectors by @vmoens in #1723
[Feature] Shared replay buffers by @vmoens in #1724
[CI] FIx nightly builds on osx by @vmoens in #1726
[BugFix] _call_actor_net does not handle multiple inputs by @albertbou92 in #1728
[Feature] Python-based RNN Modules by @albertbou92 in #1720
[BugFix, Test] Fix flaky gym vecenvs tests by @vmoens in #1727
[BugFix] Fix non-full TensorStorage indexing by @vmoens in #1730
[Feature] Minari datasets by @vmoens in #1721
[Feature] All VMAS scenarios available by @matteobettini in #1731
[Feature] pickle-free RB checkpointing by @vmoens in #1733
[CI] Fix doc upload by @vmoens in #1738
[BugFix] Fix RNNs trajectory split in VMAP calls by @vmoens in #1736
[CI] Fix doc upload by @vmoens in #1739
[BugFix, Feature] Fix DDQN implementation by @vmoens in #1737
[Algorithm] Update DQN example by @albertbou92 in #1512
[BugFix] Use rsync in doc workflow by @vmoens in #1741
[BugFix] Fix compat with new memmap API by @vmoens in #1744
[Feature] Roboset datasets by @vmoens in #1743
[Algorithm] Simpler IQL example by @BY571 in #998
[Performance] Faster RNNs by @vmoens in #1732
[BugFix, Test] Fix torch.vmap call in RNN tests by @vmoens in #1749
[BugFix] Fix discrete SAC log-prob by @vmoens in #1750
[Minor] Remove dead code in RolloutFromModel by @ianbarber in #1752
[Minor] Fix runnability of RLHF example in examples/rlhf by @ianbarber in #1753
[Feature] SliceSampler by @vmoens in #1748
[CI] Fix windows CI by @vmoens in #1746
[CI] Fix CI for optional dependencies by @vmoens in #1754
[Feature] V-D4RL by @vmoens in #1756
[Benchmark] Fix RB benchmarks by @vmoens in #1760
[BugFix] Fix RLHF by @vmoens in #1757
[BugFix] Fix slice sampler by @vmoens in #1762
[Feature] BurnInTransform by @albertbou92 in #1765
[Bug] Minor change burnin transform by @albertbou92 in #1770
[BugFix] Fix sampling of last item in SliceSampler by @vmoens in #1774
[Feature] Open-X Embodiement datasets by @vmoens in #1751
[BugFix] Fix documentation of threads for batched envs. by @skandermoalla in #1776
[BugFix, CI] Fix OpenML datasets runs by @vmoens in #1779
[Versioning] Bump v0.3.0 and fix m1-wheels by @vmoens in #1780
[Feature] Composite replay buffers by @vmoens in #1768
[BugFix, Feature] Vmap randomness in losses by @BY571 in #1740
[Algorithm] Update discrete SAC example by @BY571 in #1745
[Docs] Pointers to BenchMARL by @matteobettini in #1710
[Feature] Immutable writer for datasets by @vmoens in #1781
[Feature] Remove and check for prints in codebase using flake8-print by @vmoens in #1758
[BUG] Missing import for some Samplers in Data module by @albertbou92 in #1784
[BugFix] Ensure that infos and samples have the same batch-size in SamplerEnsemble by @vmoens in #1786
[BugFix] Writers extend() method should always return indices in data.device by @albertbou92 in #1785
[Doc] Revamp envs doc by @vmoens in #1787
[BugFix] Less flaky gym vecenv test by @vmoens in #1790
[CI] Regroup tests by @vmoens in #1791
[CI] Remove stable GPU tests from CI by @vmoens in #1792
Update README.md to fix CI banner by @vmoens in #1794
[Feature] SamplerWithoutReplacement state dictionary by @matteobettini in #1788
[BugFix] Higher time threshold for PEnv by @vmoens in #1799
[Feature] SignTransform by @albertbou92 in #1798
[Feature] Extend MaxValueWriter with reduce parameter for the rank_key by @albertbou92 in #1796
[BugFix] Fixes bug in MaxValueWriter tests by @albertbou92 in #1801
[Performance] faster gym-like class by @vmoens in #1803
[Feature] GenDGRL by @vmoens in #1773
[Performance] Minor improvements to step_and_maybe_reset in batched envs by @vmoens in #1807
[Algorithm] Discrete IQL by @BY571 in #1793
[Doc] More depth in VMAS docs by @matteobettini in #1802
[BugFix] Remove select() in favor of empty() by @vmoens in #1811
Bump jinja2 from 3.1.2 to 3.1.3 in /docs by @dependabot in #1812
[BugFix] Make TransformedEnv mirror allow_done_after_reset property of base env by @matteobettini in #1810
[Doc] Update StepCounter doc by @skandermoalla in #1813
[Feature] Improve info_dict reader by @vmoens in #1809
[CI, Minor] Regroup Gen-DGRL CI with other libs by @vmoens in #1814
[Versioning] Housekeeping in setup.py by @vmoens in #1816
[Feature] TorchRL2Gym conversion by @vmoens in #1795
[BugFix, CI] Fix snapshop imports in stable CI by @vmoens in #1821
[Feature] More flexibility in loading PettingZoo by @matteobettini in #1817
[Docs] Fix doc of ToTensorImage transforms.py by @skandermoalla in #1824
[BugFix] Fix device of container generated values in transforms by @vmoens in #1827
[Feature] Atari DQN dataset by @vmoens in #1815
[Feature] Non-functional objectives (PPO, A2C, Reinforce) by @vmoens in #1804
[Refactor] change default CKPT_BACKEND to torch by @vmoens in #1830
pyproject.toml: remove unknown properties by @GaetanLepage in #1828
[Doc, Feature] Doc improvements for video recording and CSV video formats by @vmoens in #1829
[Feature] PyTrees in replay buffers by @vmoens in #1831
[BugFix] Fix sequential step counts by @vmoens in #1838
[Doc] TED format by @vmoens in #1836
[Doc] References to TED by @vmoens in #1839
[BugFix] Temporarily set lazy legacy to True by @vmoens in #1840
[BugFix] Fix gym info scalar infos by @vmoens in #1842
[Refactor] LAZY_LEGACY_OP=False by @vmoens in #1832
[Feature] serial_for_single arg in batched envs by @vmoens in #1846
[BugFix] Fix VD4RL by @vmoens in #1834
[Doc] Make tutos runnable without colab by @vmoens in #1826
[Feature] Fine control over devices in collectors by @vmoens in #1835
[Feature, BugFix] Better thread control in penv and collectors by @vmoens in #1848
[CI] Update macos image by @vmoens in #1849
[BugFix] thread setting bug by @vmoens in #1852
Remove unused completed_keys property from StepCounter. by @skandermoalla in #1854
[Feature] Submitit run script by @albertbou92 in #1822
[BugFix] Fix flaky gym penv test by @vmoens in #1853
[CI] Fix macos build by @vmoens in #1856

New Contributors

@mikemykhaylov made their first contribution in #1655
@MarCnu made their first contribution in #1661
@Deep145757 made their first contribution in #1665
@vaibhav-009 made their first contribution in #1667
@laszloKopits made their first contribution in #1669
@ianbarber made their first contribution in #1752
@dependabot made their first contribution in #1812
@GaetanLepage made their first contribution in #1828

Full Changelog: v0.2.1...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0: Data hub, universal env converter and more!