Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on calling an embedding model, error reading from server: EOF #3886

Open
EmanuelJr opened this issue Oct 20, 2024 · 3 comments
Open

Error on calling an embedding model, error reading from server: EOF #3886

EmanuelJr opened this issue Oct 20, 2024 · 3 comments
Labels
bug Something isn't working unconfirmed

Comments

@EmanuelJr
Copy link

LocalAI version:
localai/localai:master-cublas-cuda12-ffmpeg

Environment, CPU architecture, OS, and Version:

  • K3S
  • RTX 3090
  • 2x Xeon 2680 V4

Describe the bug
Error rpc error: code = Unavailable desc = error reading from server: EOF on calling /embeddings for the model mixedbread-ai/mxbai-embed-large-v1

To Reproduce
Download the model and use the following configuration:
Moreover, I tried with mmap: true without the f16: true and some other variations.

name: mxbai-embed-large
backend: llama
embeddings: true
f16: true
parameters:
  model: mxbai-embed-large-v1-f16.gguf

Curl used:

curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
  "input": "Your text string goes here",
  "model": "mxbai-embed-large"
}'

Expected behavior
Should return the prompt embedded.

Logs

12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stderr llama_new_context_with_model: graph splits = 2
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stderr common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"initialize","line":547,"message":"initializing slots","n_slots":1}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"initialize","line":556,"message":"new slot","slot_id":0,"n_ctx_slot":512}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"launch_slot_with_data","line":929,"message":"slot is processing task","slot_id":0,"task_id":0}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"update_slots","line":1827,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
12:09PM ERR Server error error="rpc error: code = Unavailable desc = error reading from server: EOF" ip=127.0.0.1 latency=3.287815242s method=POST status=500 url=/embeddings

Additional context

@EmanuelJr EmanuelJr added bug Something isn't working unconfirmed labels Oct 20, 2024
@etlweather
Copy link

I get same also with nomic-embed-text-v1.5.Q8_0.gguf, mxbai-embed-large-v1.q8_0.gguf. (without the F16 param set).

I tried others. Basically the only embedding model I got working so far is MiniLM-L6-v2q4_0.bin using the bert-embeddings backend. And this one works but if the input is too large, it fails with a 500 error.

@EmanuelJr
Copy link
Author

@etlweather I did it work with sentencetransformers backend, it's simple to set up like the example in the docs. I still want to use llama backend instead of it.

@etlweather
Copy link

@EmanuelJr sentencetransformers would be fine, it just needs to accept a large input. But so far, all those I tried just won't work either. They fail to load... I haven't had time to look further into this yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

2 participants