This folder contains examples of running IPEX-LLM on Intel NPU:
- LLM: examples of running large language models using IPEX-LLM optimizations
- CPP: examples of running large language models using IPEX-LLM optimizations through C++ API
- Multimodal: examples of running large multimodal models using IPEX-LLM optimizations
- Embedding: examples of running embedding models using IPEX-LLM optimizations
- Save-Load: examples of saving and loading low-bit models with IPEX-LLM optimizations
Tip
Please refer to IPEX-LLM NPU Quickstart regarding more information about running ipex-llm
on Intel NPU.
Model | Example Link |
---|---|
Llama2 | Python link, C++ link |
Llama3 | Python link, C++ link |
Llama3.2 | Python link, C++ link |
GLM-Edge | Python link |
Qwen2 | Python link, C++ link |
Qwen2.5 | Python link, C++ link |
MiniCPM | Python link, C++ link |
Baichuan2 | Python link |
MiniCPM-Llama3-V-2_5 | Python link |
MiniCPM-V-2_6 | Python link |
Speech_Paraformer-Large | Python link |
Bce-Embedding-Base-V1 | Python link |