-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference failed with "axis 2 has dimension xxxx but expected yyyy" error #1769
Comments
Tried another model ("gemma-2-2b-it") and hit the same error. Below output comes from Claude-3.5-sonnet's response: So, This error message suggests that the problem may not be limited to large models, but rather an overall compatibility issue between CTranslate2 and the Gemma model series. Let's analyze this in depth: The error message "axis 2 has dimension 4096 but expected 4352" indicates that the model structure expected by CTranslate2 is inconsistent with the actual Gemma model structure. This may be due to Gemma models using some special architectural features that CTranslate2 has not fully adapted to yet. CTranslate2's support for Gemma: Solutions and suggestions:
Ensure you're using the latest version of CTranslate2 and carefully review its documentation for any special instructions or known issues regarding Gemma models.
Create an issue on CTranslate2's GitHub repository, detailing the problems you've encountered, including the model version used and error messages. This may prompt developers to add support for Gemma models.
Consider using other optimization frameworks such as ONNX Runtime or TensorRT. These frameworks may have better support for Gemma models.
If you're familiar with model structures and PyTorch, consider writing a custom script to convert the model, ensuring correct dimension matching. This requires a deep understanding of Gemma model architecture and CTranslate2's expected input.
If other methods are not feasible, consider using PyTorch's native optimization techniques such as torch.compile(), quantization, or model pruning.
Keep a close eye on CTranslate2 updates, as they may add full support for Gemma models in future versions.
8 Contact Google or Gemma model maintainers directly: Consider contacting the Gemma model development team to inquire if they have recommended optimization methods or known compatibility issues with CTranslate2.
|
Sadly Gemma-2 model line is not officially supported yet by Ctranslate2. You can try with other supported model. Checking |
I tried to use ctranslate2 as the inference framework to do model inference, but failed with error as below:
"axis 2 has dimension 8192 but expected 7680"
What I've done:
First I must convert the model to CT2 model, but due to big model size, I used Quantify parameter to reduce model file's size:
converter.convert(output_dir, quantization="int8",force=True)
Then, Load the quantified model and do inference, unfortunately I hitted below error:
"axis 2 has dimension 8192 but expected 7680" error
How to fix it ?
Inference code snippet is as below:
`
try:
# 加载量化后的模型作为 Generator
generator = ctranslate2.Generator("gemma-2-9b-it-ct2", device="cpu")
except Exception as e:
print(f"Error during model loading or inference: {e}")`
The text was updated successfully, but these errors were encountered: