We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS: Windows 11 Rust version: cargo 1.75.0 (1d8b05cdd 2023-11-20) Hardware: CPU AMD 6800HS
(text-generation-launcher --env didn't work)
Hi, I am trying to run a model locally using CPU, since I only have an AMD GPU, which apparently is not yet supported.
text-embeddings-router --model-id dunzhang/stella_en_400M_v5 --port 8080
2024-10-25T21:52:54.872449Z INFO text_embeddings_router: router\src/main.rs:175: Args { model_id: "dun*****/******_**_***M_v5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None } 2024-10-25T21:52:54.875192Z INFO hf_hub: C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hf-hub-0.3.2\src\lib.rs:55: Token file not found "C:\\Users\\user\\.cache\\huggingface\\token" 2024-10-25T21:52:54.875404Z INFO download_pool_config: text_embeddings_core::download: core\src\download.rs:38: Downloading `1_Pooling/config.json` 2024-10-25T21:52:54.875746Z INFO download_new_st_config: text_embeddings_core::download: core\src\download.rs:62: Downloading `config_sentence_transformers.json` 2024-10-25T21:52:54.875919Z INFO download_artifacts: text_embeddings_core::download: core\src\download.rs:21: Starting download 2024-10-25T21:52:54.876003Z INFO download_artifacts: text_embeddings_core::download: core\src\download.rs:23: Downloading `config.json` 2024-10-25T21:52:54.876215Z INFO download_artifacts: text_embeddings_core::download: core\src\download.rs:26: Downloading `tokenizer.json` 2024-10-25T21:52:54.876393Z INFO download_artifacts: text_embeddings_backend: backends\src\lib.rs:328: Downloading `model.safetensors` 2024-10-25T21:52:54.876567Z INFO download_artifacts: text_embeddings_core::download: core\src\download.rs:32: Model artifacts downloaded in 647.4µs 2024-10-25T21:52:54.886413Z INFO text_embeddings_router: router\src/lib.rs:206: Maximum number of tokens per request: 512 2024-10-25T21:52:54.886730Z INFO text_embeddings_core::tokenization: core\src\tokenization.rs:28: Starting 16 tokenization workers 2024-10-25T21:52:54.930092Z INFO text_embeddings_router: router\src/lib.rs:248: Starting model backend Error: Could not create backend Caused by: Could not start backend: GTE is only supported on Cuda devices in fp16 with flash attention enabled
It's asking for very specific GPU resources, even though I'm trying to run on the CPU.
Would expect the model to work :)
The text was updated successfully, but these errors were encountered:
@Astlaan hi. this may be related to #375. for now, GTE is only supported on Cuda devices in fp16 and needs to be implemented for the CPU version.
Sorry, something went wrong.
No branches or pull requests
System Info
OS: Windows 11
Rust version: cargo 1.75.0 (1d8b05cdd 2023-11-20)
Hardware: CPU AMD 6800HS
(text-generation-launcher --env didn't work)
Information
Tasks
Reproduction
Hi,
I am trying to run a model locally using CPU, since I only have an AMD GPU, which apparently is not yet supported.
It's asking for very specific GPU resources, even though I'm trying to run on the CPU.
Expected behavior
Would expect the model to work :)
The text was updated successfully, but these errors were encountered: