You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Issue: Maximum Token Limit Set to 512 Despite Model Capabilities
Steps to Reproduce:
docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate
Observed Behavior:
The output always indicates:
Maximum number of tokens per request: 512
However, according to the MTEB leaderboard, this model should be able to handle up to 131,072 tokens.
Suggested Improvement:
It would be beneficial to add a --max-input-length option to the CLI, allowing users to specify a custom token limit. I checked the current CLI options, and none appear to address the maximum input length.
Additional Details:
Even with the --auto-truncate flag, the following log is generated:
INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
This behavior appears to be controlled by the following line of code: router/src/lib.rs#L199
tracing::info!("Maximum number of tokens per request: {max_input_length}");
Would it be possible to allow users to modify this limit via a CLI option?
Expected behavior
--auto-truncate is expected to auto-handle this instead of raising an error.
--max-input-length is expected to be available in CLI options
The text was updated successfully, but these errors were encountered:
System Info
docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate
Information
Tasks
Reproduction
Issue: Maximum Token Limit Set to 512 Despite Model Capabilities
Steps to Reproduce:
docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate
Observed Behavior:
The output always indicates:
However, according to the MTEB leaderboard, this model should be able to handle up to 131,072 tokens.
Suggested Improvement:
It would be beneficial to add a
--max-input-length
option to the CLI, allowing users to specify a custom token limit. I checked the current CLI options, and none appear to address the maximum input length.Additional Details:
Even with the
--auto-truncate
flag, the following log is generated:This behavior appears to be controlled by the following line of code: router/src/lib.rs#L199
Would it be possible to allow users to modify this limit via a CLI option?
Expected behavior
--auto-truncate is expected to auto-handle this instead of raising an error.
--max-input-length is expected to be available in CLI options
The text was updated successfully, but these errors were encountered: