Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
unify xpu and cpu backend and use paged attention (#1009)
* add page attention implementation remove jit logic Signed-off-by: Wang, Yi A <[email protected]> * add support in transformers 4.45 Signed-off-by: Wang, Yi A <[email protected]> * fix congif (#935) * move patch model to init Signed-off-by: Wang, Yi A <[email protected]> * refine class IPEXPagedCache's update method (#945) * refine class IPEXPagedCache's update method Signed-off-by: Liu, Kaixuan <[email protected]> * replace tensor on xpu to List to avoid memory copy Signed-off-by: Liu, Kaixuan <[email protected]> * split IPEXPagedCache's update function into `update_for_prefill` and `update_for_decode` Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]> * fix bug when doing beam search (#954) Signed-off-by: Liu, Kaixuan <[email protected]> * enable qkv concat layer (#958) * enable qkv * split key value into 2 lists * add xpu cache optimiztion Signed-off-by: Wang, Yi A <[email protected]> * xpu mlp optimization Signed-off-by: Wang, Yi A <[email protected]> * optimize cache ops in xpu, improve for beam search Signed-off-by: Wang, Yi A <[email protected]> * enable gpt2, falcon has core dump error in PagedAttention.single_quer… (#979) * enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention * enable new_decoder_arch falcon * only keep 1 config * rm autocast * fix unit test case, CPU part is OK; Enable Falcon7b for XPU (#992) * fix bug when run IPEXCausalModel forward directly; fix bug when using `save_pretrain` Signed-off-by: Liu, Kaixuan <[email protected]> * add LinearGelu Op support for XPU Signed-off-by: Liu, Kaixuan <[email protected]> * fix unit test error Signed-off-by: Liu, Kaixuan <[email protected]> * adjust unit test case Signed-off-by: Liu, Kaixuan <[email protected]> * fix bug Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]> * skip assited decoding unit test for models using paged attention (#998) * skip assited decoding unit test for models using paged attention Signed-off-by: Liu, Kaixuan <[email protected]> * XPU CI tests get almost all passed Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]> * fix ci config (#1010) Signed-off-by: jiqing-feng <[email protected]> * Fix tests versions (#1011) * fix ci config * fix test versions * fix ipex version Signed-off-by: jiqing-feng <[email protected]> * fix torch test version (#1012) Signed-off-by: jiqing-feng <[email protected]> * use python3.9 test (#1013) * use python3.9 test Signed-off-by: jiqing-feng <[email protected]> * change ipex transformers limited verison in setup (#1015) * change ipex transformers limited verison in setup * fix inc tests Signed-off-by: jiqing-feng <[email protected]> * add XPU LinearAddAdd op (#1017) Signed-off-by: Liu, Kaixuan <[email protected]> * fix bert and vit patch (#1022) * fix bert and vit patch * fix vit and bert save Signed-off-by: jiqing-feng <[email protected]> * Paged attn (#1024) * fix reorder cache for non-patch models Signed-off-by: jiqing-feng <[email protected]> * disable torch < 2.3 tests, we won't use torch < 2.4 Signed-off-by: jiqing-feng <[email protected]> * fix test beam serach Signed-off-by: jiqing-feng <[email protected]> * fix cache selection Signed-off-by: jiqing-feng <[email protected]> * upgrad to transformers4.46 Signed-off-by: jiqing-feng <[email protected]> * change ipex test yaml transformers version to 4.46 Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * set device as the same as origin model (#1031) * set device as the same as origin model * fix device Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * Simplify IPEXModel (#1032) * simplify forward and save pretrained since no jit support * fix format * rm warmup because no jit mode anymore * simplify forward for causal lm model * fix paged pkv forward * disable use_cache when just run forward --------- Signed-off-by: jiqing-feng <[email protected]> * nice code (#1035) Signed-off-by: Liu, Kaixuan <[email protected]> * Paged attn (#1036) * nice code * device type adjustment Signed-off-by: Liu, Kaixuan <[email protected]> * Enable torch.compile for non-generation tasks in CPU (#1037) * enable compile for non-generation tasks * add no_grad in forward * warmup compiled model * disable compile not ready models * set system level optimize for torch.compile * fix typo * add comments * set torch minimum version for compiling Signed-off-by: jiqing-feng <[email protected]> * Fix ipex upload and update readme. (#1045) * fix readme and push to hub support Signed-off-by: jiqing-feng <[email protected]> * rm export in tests Signed-off-by: jiqing-feng <[email protected]> * test with torch 2.5.* Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * Fix tests (#1047) * fix tests * fix typo * add patched tests * change forward to generate * fix tests * fix test model name --------- Signed-off-by: jiqing-feng <[email protected]> * Patch gpt2 block forward for passing input_lens. (#1050) * fix forward without pkv * patch gpt2 block forward * fix typo * revert causal lm tests Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: Wang, Yi A <[email protected]> Signed-off-by: Liu, Kaixuan <[email protected]> Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: kaixuanliu <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]>
- Loading branch information