Skip to content

v0.4.0

Compare
Choose a tag to compare
@karan6181 karan6181 released this 31 Mar 02:22
· 369 commits to main since this release
d428ed8

🚀 Streaming v0.4.0

Streaming v0.4.0 is released! Install via pip:

pip install --upgrade mosaicml-streaming==0.4.0

New Feature

🔀 Dataset Mixing

  • Weighted mixing of sub-datasets on the fly during model training (#184). StreamingDataset now support an optional streams parameter which takes one or more sub-datasets and it intelligently fetches samples across sub-datasets. You can mix (upsample or downsample) datasets by defining each either relatively (proportion) or absolutely (repeat or samples or none of them to sample 1:1).

Documentation

  • Added a README which shows how to convert a raw dataset into an MDS format for Text and Vision dataset. (#183)

Bug Fixes

  • Raise an exception if the cloud storage bucket does not exist during shard file upload. (#212)
  • Remove unsupported ThreadPoolExecutor shutdown param for python38. (#199)

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0