All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Local type handler registries.
- The PyPi
orbax
package is deprecated in favor of domain-specific namespace packages, namelyorbax-checkpoint
andorbax-export
. Imports are unchanged, and still of the formimport orbax.checkpoint
orimport orbax.export
. - Finer scoped jax.monitoring calls on the save path.
- Support for OCDBT driver in Tensorstore.
- Small bug fixes.
- Use a more precise timestamp when generating temporary directory names to permit more than one concurrent checkpointing attempt per second.
- Support for generic transformation function in PyTreeCheckpointHandler.
- Support n-digit checkpoint step format.
- Eliminate Flax dependency to fix circular dependency problem.
sharding
option onArrayRestoreArgs
- Add "standard user recipe" to documentation.
- Add unit tests using mock to simulate preemption.
- Logging to increase transparency around why checkpoints are kept vs. deleted.
- Expand on uses of restore_args in colab.
- Expose utils_test.
- Add msgpack_utils to move toward eliminating Flax dependency.
- CheckpointManager starts a background thread to finalize checkpoints so that checkpoints are finalized as soon as possible in async case.
- Remove CheckpointManager update API.
- Remove support for deprecated GDA.
- Add tmp suffix on step directory creation in CheckpointManager.save.
- Preemption when using keep_time_interval caused the most recent steps before preemption to be kept, despite not falling on the keep time interval.
- A util function that constructs restore_args from a target PyTree.
- CheckpointManager
delete
API, which allows deleting an existing step. - Made dev dependencies optional to minimize import overhead.
- Refactored higher-level utils in checkpoint_utils, which provides user-convenience functions.
- Guard option to create top-level directory behind
create
option. - Remove support for Python 3.7.
- Check for metric file in addition to item directory in CheckpointManager.
- Additional logs to indicate save/restore completion.
- Support for None leaves in PyTree save/restore.
- ArrayCheckpointHandler for individual arrays/scalars.
read: bool
option on all_steps to force read from storage location instead of using cached steps.- Simplified "Getting Started" section in the docs.
- CheckpointManager creates the top level directory if it does not yet exist.
- Write msgpack bytes asynchronously.
- Removed some unused test_utils methods for filtering empty nodes.
- Update docs on
PyTreeCheckpointHandler
. - Removed unneeded AbstractCheckpointManager.
- Usage of bytes_limiter to prevent too many bytes from being read during a single restore call.
- Temp checkpoint cleanup when using a step prefix (i.e. 'checkpoint_0').
- Option to customize metadata file name for Tensorstore.
- Restore failure on GCS due to misidentification of checkpoint as "not finalized".
- Added CHANGELOG.md for version updates (additions and changes), ingested by auto-publish functionality.
- Fix mistaken usages of placeholder "AGGREGATED" where "NOT-AGGREGATED" would be more appropriate. Ensure backwards compatibility is maintained.