Error on calling an embedding model, error reading from server: EOF #3886

EmanuelJr · 2024-10-20T12:16:56Z

LocalAI version:
localai/localai:master-cublas-cuda12-ffmpeg

Environment, CPU architecture, OS, and Version:

K3S
RTX 3090
2x Xeon 2680 V4

Describe the bug
Error rpc error: code = Unavailable desc = error reading from server: EOF on calling /embeddings for the model mixedbread-ai/mxbai-embed-large-v1

To Reproduce
Download the model and use the following configuration:
Moreover, I tried with mmap: true without the f16: true and some other variations.

name: mxbai-embed-large
backend: llama
embeddings: true
f16: true
parameters:
  model: mxbai-embed-large-v1-f16.gguf

Curl used:

curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
  "input": "Your text string goes here",
  "model": "mxbai-embed-large"
}'

Expected behavior
Should return the prompt embedded.

Logs

12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stderr llama_new_context_with_model: graph splits = 2
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stderr common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"initialize","line":547,"message":"initializing slots","n_slots":1}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"initialize","line":556,"message":"new slot","slot_id":0,"n_ctx_slot":512}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"launch_slot_with_data","line":929,"message":"slot is processing task","slot_id":0,"task_id":0}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"update_slots","line":1827,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
12:09PM ERR Server error error="rpc error: code = Unavailable desc = error reading from server: EOF" ip=127.0.0.1 latency=3.287815242s method=POST status=500 url=/embeddings

Additional context

The text was updated successfully, but these errors were encountered:

etlweather · 2024-10-25T04:21:26Z

I get same also with nomic-embed-text-v1.5.Q8_0.gguf, mxbai-embed-large-v1.q8_0.gguf. (without the F16 param set).

I tried others. Basically the only embedding model I got working so far is MiniLM-L6-v2q4_0.bin using the bert-embeddings backend. And this one works but if the input is too large, it fails with a 500 error.

EmanuelJr · 2024-10-26T09:41:04Z

@etlweather I did it work with sentencetransformers backend, it's simple to set up like the example in the docs. I still want to use llama backend instead of it.

etlweather · 2024-10-27T06:33:27Z

@EmanuelJr sentencetransformers would be fine, it just needs to accept a large input. But so far, all those I tried just won't work either. They fail to load... I haven't had time to look further into this yet.

EmanuelJr added bug Something isn't working unconfirmed labels Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on calling an embedding model, error reading from server: EOF #3886

Error on calling an embedding model, error reading from server: EOF #3886

EmanuelJr commented Oct 20, 2024

etlweather commented Oct 25, 2024

EmanuelJr commented Oct 26, 2024

etlweather commented Oct 27, 2024

Error on calling an embedding model, error reading from server: EOF #3886

Error on calling an embedding model, error reading from server: EOF #3886

Comments

EmanuelJr commented Oct 20, 2024

etlweather commented Oct 25, 2024

EmanuelJr commented Oct 26, 2024

etlweather commented Oct 27, 2024