dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities #396

taoari · 2024-09-04T01:08:45Z

System Info

docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Issue: Maximum Token Limit Set to 512 Despite Model Capabilities

Steps to Reproduce:

docker run --gpus all -p 8081:80 -v $HF_HOME/hub:/data ghcr.io/huggingface/text-embeddings-inference:1.5 --model-id dunzhang/stella_en_1.5B_v5 --auto-truncate

Observed Behavior:
The output always indicates:

Maximum number of tokens per request: 512

However, according to the MTEB leaderboard, this model should be able to handle up to 131,072 tokens.

Suggested Improvement:
It would be beneficial to add a --max-input-length option to the CLI, allowing users to specify a custom token limit. I checked the current CLI options, and none appear to address the maximum input length.

Additional Details:
Even with the --auto-truncate flag, the following log is generated:

INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512

This behavior appears to be controlled by the following line of code: router/src/lib.rs#L199

tracing::info!("Maximum number of tokens per request: {max_input_length}");

Would it be possible to allow users to modify this limit via a CLI option?

Expected behavior

--auto-truncate is expected to auto-handle this instead of raising an error.
--max-input-length is expected to be available in CLI options

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities #396

dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities #396

taoari commented Sep 4, 2024 •

edited

Loading

dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities #396

dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities #396

Comments

taoari commented Sep 4, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Issue: Maximum Token Limit Set to 512 Despite Model Capabilities

Expected behavior

taoari commented Sep 4, 2024 •

edited

Loading