-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile bug: [QNN] Not able to run tiny llama model with QNN NPU #14
Comments
@chraac can you please reply on this? |
Hi @akshatshah17 ,
|
thanks @chraac it's working, but from the logs below I can see that first it's offloads the layers to GPU and after that this log is comming qnn device name qnn-gpu that is fine but later in the logs I also see some NPU related logs as well so I am not sure whether the model is running on QNN GPU or NPU? I have highlighted the parts llm_load_print_meta: max token length = 48 [qnn_init, 248]: device property is not supported [qnn_init, 258]: device counts 1 system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : NEON = 1 | ARM_FMA = 1 | MATMUL_INT8 = 1 | AARCH64_REPACK = 1 | sampler seed: 3467048278 [{<(Task)>}] [{<(ParagraphSummary)>}]
Instructions:
llama_perf_sampler_print: sampling time = 9.81 ms / 441 runs ( 0.02 ms per token, 44958.71 tokens per second) |
From you log, looks like ites running on |
Git commit
e36ad89
Operating systems
Linux
GGML backends
CPU
Problem description & steps to reproduce
I follow this procedure for build and convert the model into the quantized gguf format. But while running the model on device it is unable to load the model.
git clone https://github.com/chraac/llama.cpp.git --recursive
cd llama.cpp
git checkout dev-refactoring
export ANDROID_NDK=/home/code/Android/Ndk/android-ndk-r26d/
export QNN_SDK_PATH=/home/code/Android/qnn-sdk/qairt/2.27.5.241009/
Build for CPU
cmake -B build
cmake --build build --config Release -j16
Build for Android
cmake
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake
-DANDROID_ABI=arm64-v8a
-DANDROID_PLATFORM=android-28
-DCMAKE_C_FLAGS="-march=armv8.7a"
-DCMAKE_CXX_FLAGS="-march=armv8.7a"
-DGGML_OPENMP=OFF
-DGGML_LLAMAFILE=OFF
-DGGML_QNN=ON
-DGGML_QNN_DEFAULT_LIB_SEARCH_PATH=/data/local/tmp
-B build-android
cmake --build build-android --config Release -j4
cmake --install build-android --prefix install-android --config Release
Model conversion
python3 convert_hf_to_gguf.py ~/tiny_llama/ --outfile output_file_tiny_llama_fp32.gguf --outtype f32
./build/bin/llama-quantize output_file_tiny_llama_fp32.gguf output_file_tiny_llama_Q4_K_M.gguf Q4_K_M
On S24 QC
adb push install-android/ /data/local/tmp/
adb push output_file_tiny_llama_Q4_K_M.gguf /data/local/tmp/
export LD_LIBRARY_PATH=/data/local/tmp/install-android/lib/
./install-android/bin/llama-cli -m output_file_tiny_llama_Q4_K_M.gguf -c 512 -p "prompt"
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: