docs: Add findings from exploration into model tuning performance deg…

…radation (#315) * docs: Add findings from exploration into model tuning performance degradation Signed-off-by: Will Johnson <[email protected]> * fix: More specifically refer to COS instead of just PVC Signed-off-by: Will Johnson <[email protected]> * docs: Change section name and remove numbers from README.md Signed-off-by: Will Johnson <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]> Signed-off-by: Anh Uong <[email protected]>
foundation-model-stack · Aug 27, 2024 · 474e539 · 474e539
1 parent 2c56c30
commit 474e539
Showing 1 changed file with 7 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -270,6 +270,13 @@ generation_config.json	model-00005-of-00006.safetensors  tokenizer.model
 
 </details>
 
+#### Optimizing writing checkpoints
+Writing models to Cloud Object Storage (COS) is an expensive operation. Saving model checkpoints to a local directory causes much faster training times than writing to COS. You can use `output_dir` and `save_model_dir` to control which type of storage you write your checkpoints and final model to.
+
+You can set `output_dir` to a local directory and set `save_model_dir` to COS to save time on write operations while ensuring checkpoints are saved.
+
+In order to achieve the fastest train time, set `save_strategy="no"`, as saving no checkpoints except for the final model will remove intermediate write operations all together.
+
 ## Tuning Techniques:
 
 ### LoRA Tuning Example