-
Notifications
You must be signed in to change notification settings - Fork 149
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Quantization Compressor Support (#2260)
* initial commit * update setup.py * Update setup.py * fix setup.py * move all config to sparsetensors * cleanup class name and comments * initial implementation untested * fixing issues * add test script * update perplexity test * refactor to compressed-tensors * rename sparsetensors * update setup * Sa/model reload (#2250) * working reload * sparsegpt * cleanup * refactor tests * only run oneshot once * all tests passing * remove unused config * reset models on each parameterize * style * bring back SparsityConfigMetadata * Update setup.py Co-authored-by: Rahul Tuli <[email protected]> * add more comparisons, tighten threshold * use wikitext for perplexity * update setup * fix import problem * fix clearml test * compressed-tensors are transformers dep * address PR comments * can't repeat freeze * UX pr comments * initial commit * style * skipping unit tests * tests for quantization * reloading unit tests * backwards compat * test updates * update format * fix inferring * quality * shape consistency * address PR comments * PR comments * fixing some things * style * pull from cp main * postmerge too * export needs it too * Update src/sparseml/modifiers/obcq/utils/sgpt_wrapper.py Co-authored-by: Rahul Tuli <[email protected]> --------- Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George Ohashi <[email protected]>
- Loading branch information
1 parent
214873b
commit 8a7fc99
Showing
10 changed files
with
327 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
48 changes: 48 additions & 0 deletions
48
src/sparseml/transformers/compression/quantization_format.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
from typing import Optional | ||
|
||
from compressed_tensors import CompressionFormat | ||
from compressed_tensors.quantization.utils import is_model_quantized | ||
|
||
|
||
__all__ = ["infer_quantization_format"] | ||
|
||
|
||
def infer_quantization_format( | ||
model, quantization_format: Optional[str] = None, save_compressed: bool = False | ||
) -> str: | ||
""" | ||
Infers a quantization format based on model state and compression args | ||
:param model: model to check for quantization, if the model is not quantized no | ||
quantization format is returned | ||
:param quantization_format: user provided quantization format, supercedes any | ||
inferred quantization format | ||
:param save_compressed: used to infer a quantization format if None is provided | ||
:return compression format appropriate for model | ||
""" | ||
if not is_model_quantized(model): | ||
return None | ||
|
||
if quantization_format is not None: | ||
return quantization_format | ||
|
||
if save_compressed: | ||
return CompressionFormat.int_quantized | ||
else: | ||
# format will be inferred from config | ||
return None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.