v0.4.0
🚀 Streaming v0.4.0
Streaming v0.4.0
is released! Install via pip
:
pip install --upgrade mosaicml-streaming==0.4.0
New Feature
🔀 Dataset Mixing
- Weighted mixing of sub-datasets on the fly during model training (#184). StreamingDataset now support an optional
streams
parameter which takes one or more sub-datasets and it intelligently fetches samples across sub-datasets. You can mix (upsample or downsample) datasets by defining each either relatively (proportion
) or absolutely (repeat
orsamples
or none of them to sample 1:1).
Documentation
- Added a README which shows how to convert a raw dataset into an MDS format for Text and Vision dataset. (#183)
Bug Fixes
- Raise an exception if the cloud storage bucket does not exist during shard file upload. (#212)
- Remove unsupported ThreadPoolExecutor shutdown param for python38. (#199)
What's Changed
- Update GCS cloud storage credential document by @karan6181 in #181
- Update API reference doc to be compatible with sphinx by @karan6181 in #182
- Add a readme for text and vision convert script modal type by @karan6181 in #183
- Fix docstrings by @knighton in #185
- Synchronize before destroying process group by @coryMosaicML in #186
- Bump pytest from 7.2.1 to 7.2.2 by @dependabot in #187
- Bump pypandoc from 1.10 to 1.11 by @dependabot in #188
- White-box weighted mixing of streaming datasets by @knighton in #184
- Organize partitioning code by @knighton in #190
- Bump pydantic from 1.10.5 to 1.10.6 by @dependabot in #194
- Bump uvicorn from 0.20.0 to 0.21.0 by @dependabot in #196
- Bump fastapi from 0.92.0 to 0.94.0 by @dependabot in #198
- Remove unsupported ThreadPoolExecutor shutdown param in python38 by @karan6181 in #199
- Fix doctstrings (maybe?) by @Landanjs in #200
- Demo: crawling, converting, and iterating weighted dataset subsets by @knighton in #191
- Update WebVid README.md by @knighton in #202
- Fix leftover test dirs and improve dataset method and variable names by @knighton in #201
- Bump fastapi from 0.94.0 to 0.95.0 by @dependabot in #205
- Bump uvicorn from 0.21.0 to 0.21.1 by @dependabot in #206
- Raise an exception if bucket does not exist during upload by @karan6181 in #212
- Bump yamllint from 1.29.0 to 1.30.0 by @dependabot in #209
- Bump pydantic from 1.10.6 to 1.10.7 by @dependabot in #211
- Register atexit handler for resource cleanup by @karan6181 in #215
- Bump version to 0.4.0 by @karan6181 in #216
New Contributors
- @coryMosaicML made their first contribution in #186
- @Landanjs made their first contribution in #200
Full Changelog: v0.3.0...v0.4.0