Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] be compatible with auto-round & initial auto-round support #12581

Closed
wants to merge 4 commits into from

Conversation

rnwang04
Copy link
Contributor

@rnwang04 rnwang04 commented Dec 19, 2024

Description

1. Why the change?

https://github.com/analytics-zoo/nano/issues/1786

2. User API changes

If we have an auto-round quantized model (Llama / Qwen / Minicpm / Baichuan) whose bits is 4 & group size is -1, we can load it and run it on NPU with below usage :

from ipex_llm.transformers.npu_models.convert_auto_round_model import convert_auto_round_model_to_npu_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRoundConfig
import torch

if __name__ == "__main__":
    backend = "cpu"
    quantization_config = AutoRoundConfig(
        backend=backend,
    )

    model = AutoModelForCausalLM.from_pretrained(quantized_model_path,
                                                 device_map=backend.split(':')[0],
                                                 quantization_config=quantization_config,
                                                 attn_implementation="eager",
                                                 trust_remote_code=True,
                                                 torch_dtype=torch.float16)
    print("finish load quantized model.")

    model = convert_auto_round_model_to_npu_model(model, save_directory=save_dir)

    tokenizer = AutoTokenizer.from_pretrained(original_model_path, trust_remote_code=True)

Note, to run auto-round models, below is the env requirement:

conda create -n npu-round python-3.11
conda activate npu-round
# for NPU
pip install --pre --upgrade ipex-llm[npu]
# for auto-round
pip install auto-round==0.4.2
pip uninstall auto_gtpq

3. Summary of the change

  • Support auto-round-patch to load auto-round model on cpu for Windows
  • Support unpack_auto_round_layer for sym_int4 & asym_int4
  • Support necessary code changes to be compatible with auto-round
  • Support converting auto-round model to NPU optimized model in convert_auto_round_model_to_npu_model function

This is just an initial PR, we can enhance it or integrate it into from_pretrained later if necessary.

4. How to test?

  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.

@rnwang04 rnwang04 marked this pull request as draft December 19, 2024 08:49
@rnwang04 rnwang04 changed the title [test] be compatible with auto-round [NPU] be compatible with auto-round & initial auto-round support Dec 24, 2024
@rnwang04 rnwang04 marked this pull request as ready for review December 24, 2024 13:48
@rnwang04 rnwang04 force-pushed the auto-round branch 3 times, most recently from daddf17 to 7abcd3c Compare December 24, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant