[NPU] be compatible with auto-round & initial auto-round support #12581

rnwang04 · 2024-12-19T08:49:37Z

Description

1. Why the change?

https://github.com/analytics-zoo/nano/issues/1786

2. User API changes

If we have an auto-round quantized model (Llama / Qwen / Minicpm / Baichuan) whose bits is 4 & group size is -1, we can load it and run it on NPU with below usage :

from ipex_llm.transformers.npu_models.convert_auto_round_model import convert_auto_round_model_to_npu_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRoundConfig
import torch

if __name__ == "__main__":
    backend = "cpu"
    quantization_config = AutoRoundConfig(
        backend=backend,
    )

    model = AutoModelForCausalLM.from_pretrained(quantized_model_path,
                                                 device_map=backend.split(':')[0],
                                                 quantization_config=quantization_config,
                                                 attn_implementation="eager",
                                                 trust_remote_code=True,
                                                 torch_dtype=torch.float16)
    print("finish load quantized model.")

    model = convert_auto_round_model_to_npu_model(model, save_directory=save_dir)

    tokenizer = AutoTokenizer.from_pretrained(original_model_path, trust_remote_code=True)

Note, to run auto-round models, below is the env requirement:

conda create -n npu-round python-3.11
conda activate npu-round
# for NPU
pip install --pre --upgrade ipex-llm[npu]
# for auto-round
pip install auto-round==0.4.2
pip uninstall auto_gtpq

3. Summary of the change

Support auto-round-patch to load auto-round model on cpu for Windows
Support unpack_auto_round_layer for sym_int4 & asym_int4
Support necessary code changes to be compatible with auto-round
Support converting auto-round model to NPU optimized model in convert_auto_round_model_to_npu_model function

This is just an initial PR, we can enhance it or integrate it into from_pretrained later if necessary.

4. How to test?

Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.

rnwang04 marked this pull request as draft December 19, 2024 08:49

rnwang04 force-pushed the auto-round branch from 0eb4545 to 76bebc0 Compare December 24, 2024 01:58

be compatible with auto-round

76bebc0

rnwang04 changed the title ~~[test] be compatible with auto-round~~ [NPU] be compatible with auto-round & initial auto-round support Dec 24, 2024

rnwang04 marked this pull request as ready for review December 24, 2024 13:48

rnwang04 added 2 commits December 24, 2024 21:49

initial auto-round support integration

c37a180

fix style

50ace72

rnwang04 force-pushed the auto-round branch 3 times, most recently from daddf17 to 7abcd3c Compare December 24, 2024 14:15

fix style

7abcd3c

rnwang04 requested a review from jason-dai December 24, 2024 14:20

rnwang04 mentioned this pull request Dec 26, 2024

[NPU] Compatible with other third-party models like auto-round #12620

Merged

2 tasks

rnwang04 closed this Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] be compatible with auto-round & initial auto-round support #12581

[NPU] be compatible with auto-round & initial auto-round support #12581

rnwang04 commented Dec 19, 2024 •

edited

Loading

[NPU] be compatible with auto-round & initial auto-round support #12581

[NPU] be compatible with auto-round & initial auto-round support #12581

Conversation

rnwang04 commented Dec 19, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

rnwang04 commented Dec 19, 2024 •

edited

Loading