From 2230786e33bbea182ffcddccbe33f190212731ac Mon Sep 17 00:00:00 2001 From: Georgi Gerganov Date: Tue, 17 Dec 2024 16:12:15 +0200 Subject: [PATCH] server : update readme ggml-ci --- examples/server/README.md | 42 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/examples/server/README.md b/examples/server/README.md index 7cb4c43b3450e1..fab9555b3d1948 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -761,6 +761,8 @@ curl http://localhost:8080/v1/chat/completions \ ### POST `/v1/embeddings`: OpenAI-compatible embeddings API +This endpoint requires that the model uses a pooling different than type `none`. + *Options:* See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings). @@ -793,7 +795,45 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r }' ``` -When `--pooling none` is used, the server will output an array of embeddings - one for each token in the input. +### POST `/embeddings`: non-OpenAI-compatible embeddings API + +This endpoint supports `--pooling none`. When used, the responses will contain the embeddings for all input tokens. +Note that the response format is slightly different than `/v1/embeddings` - it does not have the `"data"` sub-tree and the +embeddings are always returned as vector of vectors. + +*Options:* + +Same as the `/v1/embeddings` endpoint. + +*Examples:* + +Same as the `/v1/embeddings` endpoint. + +**Response format** + +```json +[ + { + "index": 0, + "embedding": [ + [ ... embeddings for token 0 ... ], + [ ... embeddings for token 1 ... ], + [ ... ] + [ ... embeddings for token N-1 ... ], + ] + }, + ... + { + "index": P, + "embedding": [ + [ ... embeddings for token 0 ... ], + [ ... embeddings for token 1 ... ], + [ ... ] + [ ... embeddings for token N-1 ... ], + ] + } +] +``` ### GET `/slots`: Returns the current slots processing state