From 2230786e33bbea182ffcddccbe33f190212731ac Mon Sep 17 00:00:00 2001
From: Georgi Gerganov <ggerganov@gmail.com>
Date: Tue, 17 Dec 2024 16:12:15 +0200
Subject: [PATCH] server : update readme

ggml-ci
---
 examples/server/README.md | 42 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/examples/server/README.md b/examples/server/README.md
index 7cb4c43b3450e1..fab9555b3d1948 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -761,6 +761,8 @@ curl http://localhost:8080/v1/chat/completions \
 
 ### POST `/v1/embeddings`: OpenAI-compatible embeddings API
 
+This endpoint requires that the model uses a pooling different than type `none`.
+
 *Options:*
 
 See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-reference/embeddings).
@@ -793,7 +795,45 @@ See [OpenAI Embeddings API documentation](https://platform.openai.com/docs/api-r
   }'
   ```
 
-When `--pooling none` is used, the server will output an array of embeddings - one for each token in the input.
+### POST `/embeddings`: non-OpenAI-compatible embeddings API
+
+This endpoint supports `--pooling none`. When used, the responses will contain the embeddings for all input tokens.
+Note that the response format is slightly different than `/v1/embeddings` - it does not have the `"data"` sub-tree and the
+embeddings are always returned as vector of vectors.
+
+*Options:*
+
+Same as the `/v1/embeddings` endpoint.
+
+*Examples:*
+
+Same as the `/v1/embeddings` endpoint.
+
+**Response format**
+
+```json
+[
+  {
+    "index": 0,
+    "embedding": [
+      [ ... embeddings for token 0   ... ],
+      [ ... embeddings for token 1   ... ],
+      [ ... ]
+      [ ... embeddings for token N-1 ... ],
+    ]
+  },
+  ...
+  {
+    "index": P,
+    "embedding": [
+      [ ... embeddings for token 0   ... ],
+      [ ... embeddings for token 1   ... ],
+      [ ... ]
+      [ ... embeddings for token N-1 ... ],
+    ]
+  }
+]
+```
 
 ### GET `/slots`: Returns the current slots processing state