-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring: add helper class to bind qnn tensor -> ggml tensor #2
base: qualcomm_qnn_backend_for_ggml
Are you sure you want to change the base?
Refactoring: add helper class to bind qnn tensor -> ggml tensor #2
Conversation
ggml-qnn.cpp
Outdated
QNN_LOG_WARN("alloc rpcmem failure, %s\n", strerror(errno)); | ||
QNN_LOG_DEBUG("tensor%p name %s", _qnn_tensor, QNN_TENSOR_GET_NAME(*_qnn_tensor)); | ||
_context = nullptr; | ||
// TODO: should we free the tensor here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we free the _qnn_tensor
here create by tensorCreateGraphTensor
(line 1979)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要。高通QNN SDK貌似没有提供类似的函数。看高通的文档,貌似SDK内部会管理这些内部资源。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您在这个PR里提到的问题我此前已经注意到了,暂时没有理解为啥会这样。高通QNN SDK的技术资料比较少,目前只有哪个SDK reference manual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
个人判断应该是漏了些同步操作,不过确实没啥信息
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不太清楚,我做过各种实验。没有查到公开的有价值的参考资料。
从已有的公开资料来看,国内目前已经实现了高通NPU加速的公司有几家,其中发布了Open MiniCPM-V的面壁智能是其中一家。如果您是商业公司雇员,可以联系对方。如果是我这样的独立开发者,在没有与QTI签定NDA以及得到高通技术支持的情况下,试图完全做出来,难度可能不小:比如哪些出错信息代码,根本不知道具体是啥意思。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前用过Qualcomm的GPU profiler,也是bug一堆,这种问题感觉得等他们自己修了,我们要workaround的话,会花掉很多无谓的时间
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞同您的观点。最好得到高通的技术支持。
thanks for your PR. we can discuss this problem in my personal learning&study project:https://github.com/zhouwg/kantv/tree/ggml-qnn-refine/core/ggml/llamacpp/tests/ggml-qnn. 不了解您是啥背景?类似我这样闲着没事干的独立个人开发者?还是AI相关公司雇员?如果您是公司雇员,可以联系面壁智能(他们应该已经做到了高通NPU的加速). 没想到您还对这个问题感兴趣。我现在对上游项目不感兴趣了,如您愿意,可以在我的学习&研究项目里研究ggml-qnn相关问题,可以用中文,这样更方便,最后将研究结果贡献给社区。不用费劲提交给上游了(事实上,自从哪个ggml-rpc.cpp相关的多个pr被合并到master分支后------很失望)。 |
我个人对这个问题还是比较乐观,毕竟rpc那系列pr我也看过,并且日常我也用,和qnn这个,我认为也不冲突。 我做这个还是出于个人爱好,主要可以学习交流新东西,兴趣使然。所以也没打算和厂商联系,基于公开资料做做这样。 |
感谢老哥回复,你用业余时间做到这样,已经不错了,加油! |
4d70039
to
65a14d9
Compare
@zhouwg 老哥,有空麻烦看看PR,如果没啥问题能不能帮忙merge到你的分支去。 |
7a77028
to
dfe159f
Compare
Thank you for the development. Device:Snapdragon8 Gen3 16GB llama-server -m models/Kitsunebi-v1-Gemma2-8k-9B.Q4_K_M.gguf -ngl 40
...
[ggml_qnn_graph, 27]: graph name MUL_MAT_3584x2048
x1x1_3584x2x1x1_2048x2x1x1
[ggml_qnn_graph, 75]: can't create qnn graph handl
e with graph name MUL_MAT_3584x2048x1x1_3584x2x1x1
_2048x2x1x1, error = 6003 diff --git a/ggml/src/ggml-backend.c b/ggml/src/gg
ml-backend.c
index a8eafac4..e2b421e2 100644
--- a/ggml/src/ggml-backend.c
+++ b/ggml/src/ggml-backend.c
@@ -287,6 +287,7 @@ bool ggml_backend_supports_op(
ggml_backend_t backend, const struct ggml_tensor *
}
bool ggml_backend_supports_buft(ggml_backend_t backend, ggml_backend_buffer_type_t buft) {
+ if (NULL == backend->iface.supports_buft) return true;
┆ ┆return backend->iface.supports_buft(backend, buft);
} |
Hi @myan-o , thanks your for the feedback, as i said before, in ggml, the input tensor of matmul operator need to be transposed, to achieve that i've a lot more refactoring work to do, so now the mulmat operator is still under constructiion, for more information, could have a look here: chraac@63dc587 |
Thank you for your answer. So does that mean that matmul operations are not implemented yet? I have also made a pull request for some minor fixes, so please take a look. |
The termux development environment lacks the c++ std library and the build fails.
[ 5%] Building CXX object ggml/src/CMakeFiles/ggm
l.dir/ggml-qnn/utils.cpp.o
/data/data/com.termux/files/home/git/llama.cpp/ggm
l/src/ggml-qnn/utils.cpp:124:23: error: reference
to unresolved using declaration
124 | void *data = std::aligned_alloc(alignm
ent, size_aligned); |
@chraac 抱歉打扰开发者,当前我也在高通骁龙Gen2的设备上基于llama.cpp部署LLM,使用的模型为Qwen2 0.5B,我已经注意到你的仓库中存在多个branch,请问如果我需要测试的话应该使用哪个分支呢? |
Hi @FranzKafkaYu , 可以用 |
今天有空测试了一下该分支,无法正常编译通过,编译command:
相关报错日志:
不确定具体问题在哪里,也许是QNN SDK不对? PS:是否可以开放仓库的issue区,相关问题我们可到你的仓库进行讨论,避免打扰其他开发者 |
Hi @FranzKafkaYu , 抱歉现在才回复,issue已经开了,另外你的这个问题也已经fix,这个是static assert,为了防止op增减造成内部op数组index错误故意设计的, 具体的设计可以到我的fork讨论 |
Hello, @chraac and @zhouwg. I wanted to thank you both for your work on this feature, just know that there are others like me that are following this closely. @AndreasKunar has mentioned this effort on his Performance of llama.cpp on Snapdragon X Elite/Plus discussion and the Support for Snapdragon X Elite NPU & GPU issue open in the ollama repo. There is a bit of interest for those of us who want to use llama.cpp and ollama with Snapdragon X Elite, we are rooting for you! As I was trying to see if there was anything I could do to give you a hand, I noticed that you seemed to be struggling a bit with a few things that might not be documented such as if the tensor should be freed up or if the SDK managed those resources internally, along with questions related to synchronization operations that might have made you consider waiting for technical support from Qualcomm. How about engaging @yeonseok-zeticai? As he mentioned in the previously closed PR, he worked at Qualcomm until early this year, he has quite a bit of experience with the Qualcomm AI SDK, and he is interested in getting these things done. (Thank you @yeonseok-zeticai!) It also appears that Andreas might have more time on October to also take a look at this. Would you like to coordinate any efforts? I know how to program in C++ but I am not as familiar with llama.cpp nor ollama as I would like to be; however, I can do my best to learn and aid, too, in any way possible. |
Hi @jorge-abarca ,
I'd like to start by thanking everyone for their attention to this project!
We've made significant progress since my last comment. Here's our current status:
Any assistance would be greatly appreciated! Please direct your comments and contributions to my fork: chraac:dev-refactoring
Reviewed the issue, and I'm delighted to hear that someone is interested in contributing to my fork. I'd be happy to discuss this further. Please feel free to raise issues and submit pull requests (PRs) on my fork. Your input is welcome and appreciated. Thank you! |
@chraac 你好,我验证了dev-refactoring分支,使用以下命令编译 |
Hi @Pateo-sunnyhuang , 你好,感谢关注,现在 |
感谢回复,关于推理过程我加了一些日志,发现两个问题 ggml-qnn:[load_system, 751]: find a valid qnn system interface ggml-qnn:[qnn_system_interface, 10]: initialize qnn system successfully ggml-qnn:[load_backend, 766]: lib_path:/data/local/tmp/libQnnCpu.so ggml-qnn:[load_backend, 788]: num_providers=1 ggml-qnn:[load_backend, 801]: QNN_API_VERSION_MAJOR=2, major=2, QNN_API_VERSION_MINOR=14, minor=14, ggml-qnn:[load_backend, 814]: find a valid qnn interface ggml-qnn:[qnn_init, 248]: device property is not supported ggml-qnn:[qnn_init, 299]: create QNN device successfully ggml-qnn:[ggml_backend_qnn_init, 449]: qnn device name QNN-CPU 是否可以给出这两个问题的排查的方向或者指导。 |
llama-cli 有参数选择使用deviceid, 参数是 -mg, 可以设置为2 |
Hi,你好!不好意思,最近比较忙,回复比较慢: 针对第一点这个情况,upstream的backend registry最近一直在重构,所以这里一直有变动,我也在根据新的接口适配,具体可以看upstream的这个project:https://github.com/users/ggerganov/projects/12 第二点这个,现在我主要精力还是集中在让 另外如果关注 |
As I said in your upstream PR, better to have a function for wrapping
ggml_tensor
intoQnn_Tensor_t
.So here i create a PR for it.
Run test on cpu backend, works well
Run on npu backend, also works well: