diff --git a/README.md b/README.md index aebf68900..15b7bc3f2 100644 --- a/README.md +++ b/README.md @@ -823,12 +823,13 @@ For details about how you can use set a custom stopping criteria and perform cus ## Experiment Tracking -Experiment tracking in fms-hf-tuning allows users to track their experiments with known trackers like [Aimstack](https://aimstack.io/) or custom trackers built into the code like +Experiment tracking in fms-hf-tuning allows users to track their experiments with known trackers like [Aimstack](https://aimstack.io/), [MLflow](https://mlflow.org/) or custom trackers built into the code like [FileLoggingTracker](./tuning/trackers/filelogging_tracker.py) The code supports currently two trackers out of the box, * `FileLoggingTracker` : A built in tracker which supports logging training loss to a file. * `Aimstack` : A popular opensource tracker which can be used to track any metrics or metadata from the experiments. +* `MLflow` : Another popular opensource tracker which stores metrics, metadata or even artifacts from experiments. Further details on enabling and using the trackers mentioned above can be found [here](docs/experiment-tracking.md). diff --git a/docs/experiment-tracking.md b/docs/experiment-tracking.md index edc4e5978..deefdd35e 100644 --- a/docs/experiment-tracking.md +++ b/docs/experiment-tracking.md @@ -115,6 +115,34 @@ sft_trainer.train(train_args=training_args, tracker_configs=tracker_configs,.... The code expects either the `local` or `remote` repo to be specified and will result in a `ValueError` otherwise. See [AimConfig](https://github.com/foundation-model-stack/fms-hf-tuning/blob/a9b8ec8d1d50211873e63fa4641054f704be8712/tuning/config/tracker_configs.py#L25) for more details. +## MLflow Tracker + +To enable [MLflow](https://mlflow.org/) users need to pass `"mlflow"` as the requested tracker as part of the [training argument](https://github.com/foundation-model-stack/fms-hf-tuning/blob/a9b8ec8d1d50211873e63fa4641054f704be8712/tuning/config/configs.py#L131). + + +When using MLflow, users need to specify additional arguments which specify [mlflow tracking uri](https://mlflow.org/docs/latest/tracking.html#common-setups) location where either a [mlflow supported database](https://mlflow.org/docs/latest/tracking/backend-stores.html#supported-store-types) or [mlflow remote tracking server](https://mlflow.org/docs/latest/tracking/server.html) is running. + +Example +``` +from tuning import sft_trainer +from tuning.config.tracker_configs import MLflowConfig, TrackerConfigFactory + +training_args = TrainingArguments( + ..., + trackers = ["mlflow"], +) + +tracker_configs = TrackerConfigFactory( + mlflow_config=MLflowConfig( + mlflow_experiment="experiment-name", + mlflow_tracking_uri= + ) + ) + +sft_trainer.train(train_args=training_args, tracker_configs=tracker_configs,....) +``` + +The code expects a valid uri to be specified and will result in a `ValueError` otherwise. ## Running the code via command line `tuning/sft_trainer::main` function @@ -123,10 +151,10 @@ If running the code via main function of [sft_trainer.py](../tuning/sft_trainer. To enable tracking please pass ``` ---tracker +--tracker ``` -To further customise tracking you can specify additional arguments needed by the tracker like +To further customise tracking you can specify additional arguments needed by the tracker like (example shows aim follow similarly for mlflow) ``` --tracker aim --aim_repo --experiment