[Fix] Fully functional FSDP one-shot process #2305
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: This PR should be landed in unison with: neuralmagic/compressed-tensors#58
Feature Description
A subtle set of fixes to enable FSDP one-shot. The fixes are mostly focused on correctly undoing the naming changes enforced by the wrapped FSDP module.
Testing
Note: The FSDP process was run with
num_processes: 1
, as well asnum_processes: 2
. Both runs yielded similar perplexities.Model generation script
To run FSDP training:
Model testing script
Result
The resulting post-FSDP one-shot model has the same perplexity and sparsity of its weights compared to the counterpart: