Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities #396

Open
2 of 4 tasks
taoari opened this issue Sep 4, 2024 · 0 comments
Open
2 of 4 tasks

Comments

@taoari
Copy link

taoari commented Sep 4, 2024

System Info

docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Issue: Maximum Token Limit Set to 512 Despite Model Capabilities

Steps to Reproduce:

docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate

Observed Behavior:
The output always indicates:

Maximum number of tokens per request: 512

However, according to the MTEB leaderboard, this model should be able to handle up to 131,072 tokens.

Suggested Improvement:
It would be beneficial to add a --max-input-length option to the CLI, allowing users to specify a custom token limit. I checked the current CLI options, and none appear to address the maximum input length.

Additional Details:
Even with the --auto-truncate flag, the following log is generated:

INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512

This behavior appears to be controlled by the following line of code: router/src/lib.rs#L199

tracing::info!("Maximum number of tokens per request: {max_input_length}");

Would it be possible to allow users to modify this limit via a CLI option?

Expected behavior

--auto-truncate is expected to auto-handle this instead of raising an error.
--max-input-length is expected to be available in CLI options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant