Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opensearch support #574

Merged
merged 5 commits into from
Oct 13, 2023
Merged

Add opensearch support #574

merged 5 commits into from
Oct 13, 2023

Conversation

nicoloboschi
Copy link
Member

@nicoloboschi nicoloboschi commented Oct 12, 2023

Changes:

  • New datasource "opensearch" that support OpenSearch v2 and AWS OpenSearch Serverless
  • New asset "opensearch-index" that creates an index on opensearch. mappings and settings are fully configurable
  • "query-vector-db" can now query on a opensearch datasource. The query object is fully configurable-
  • "vector-db-sink" can now writes to a opensearch datasource. Only bulk is supported with batch-size and flush-interval. All the bulk options are supported (e.g. refresh, timeout..)
  • Added example in the examples directory

FULL Rag example

topics:
  - name: "insert-topic"
    creation-mode: create-if-not-exists
assets:
  - name: "os-index"
    asset-type: "opensearch-index"
    creation-mode: create-if-not-exists
    config:
        index-name: "my-index-1"
        datasource: "OSDatasource"
        settings: |
            {
                "index": {
                      "knn": true,
                      "knn.algo_param.ef_search": 100
                }
            }
        mappings: |
            {
                "properties": {
                      "content": {
                            "type": "text"
                      },
                      "embeddings": {
                            "type": "knn_vector",
                            "dimension": 3
                      }
                }
            }
pipeline:
  - id: write
    name: "Write"
    type: "vector-db-sink"
    input: "insert-topic"
    configuration:
      datasource: "OSDatasource"
      index-name: "my-index-1"
      bulk-parameters:
        refresh: "true"
      id: "key"
      fields:
        - name: "content"
          expression: "value.content"
        - name: "embeddings"
          expression: "value.embeddings"
---
topics:
  - name: "input-topic"
    creation-mode: create-if-not-exists
  - name: "result-topic"
    creation-mode: create-if-not-exists
pipeline:
  - id: read
    name: "read"
    type: "query-vector-db"
    input: "input-topic"
    output: "result-topic"
    configuration:
      datasource: "OSDatasource"
      query: |
        {
          "query": {
            "knn": {
              "embeddings": {
                "vector": ?,
                "k": 1
              }
            }
          }
        }
      fields:
        - "value.embeddings"
      output-field: "value.query-result"

Copy link
Member

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over LGTM
Please add a example application, like the one for solr

}

private String getIndexName() {
String tableName =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indexName

@nicoloboschi nicoloboschi marked this pull request as ready for review October 13, 2023 10:41
@nicoloboschi nicoloboschi merged commit 7878593 into main Oct 13, 2023
9 checks passed
nicoloboschi added a commit that referenced this pull request Oct 13, 2023
nicoloboschi added a commit that referenced this pull request Oct 16, 2023
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants