[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

LLIKKE · 2024-10-28T03:01:16Z

Hi, on LLaMA2_7B_Chat_Quantized, I noticed that the compile job on the ai hub is the QNN : v2.27.0.240926142112_100894 version, but no matter if I use 2.27.7 or 2.27.0 I get the same error!

HNBVL-AN00:/data/local/tmp $ cd llama2_7b_qnn/
user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"            <
Using libGenie.so version 1.1.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 1 : err 4000"
[ERROR] "Create From Binary FAILED!"
[ERROR] "Failed to free device: 14003"
[ERROR] "Device Free failure"
Failure to initialize model
Failed to create the dialog.

I follow https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie. using 8gen3. But llama3_2 3b model runs well.

The text was updated successfully, but these errors were encountered:

mestrona-3 added the question Please ask any questions on Slack. This issue will be closed once responded to. label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

LLIKKE commented Oct 28, 2024

[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

Comments

LLIKKE commented Oct 28, 2024