Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

Open
LLIKKE opened this issue Oct 28, 2024 · 0 comments
Open

[BUG] Error running LLaMA2_7B_Chat_Quantized on 8gen3 device. #109

LLIKKE opened this issue Oct 28, 2024 · 0 comments
Labels
question Please ask any questions on Slack. This issue will be closed once responded to.

Comments

@LLIKKE
Copy link

LLIKKE commented Oct 28, 2024

Hi, on LLaMA2_7B_Chat_Quantized, I noticed that the compile job on the ai hub is the QNN : v2.27.0.240926142112_100894 version, but no matter if I use 2.27.7 or 2.27.0 I get the same error!

HNBVL-AN00:/data/local/tmp $ cd llama2_7b_qnn/
user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"            <
Using libGenie.so version 1.1.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 1 : err 4000"
[ERROR] "Create From Binary FAILED!"
[ERROR] "Failed to free device: 14003"
[ERROR] "Device Free failure"
Failure to initialize model
Failed to create the dialog.

I follow https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie. using 8gen3. But llama3_2 3b model runs well.

@mestrona-3 mestrona-3 added the question Please ask any questions on Slack. This issue will be closed once responded to. label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Please ask any questions on Slack. This issue will be closed once responded to.
Projects
None yet
Development

No branches or pull requests

2 participants