Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vector-databases] Add support for Apache Solr #565

Merged
merged 9 commits into from
Oct 12, 2023
Merged

[vector-databases] Add support for Apache Solr #565

merged 9 commits into from
Oct 12, 2023

Conversation

eolivelli
Copy link
Member

@eolivelli eolivelli commented Oct 11, 2023

Summary:

Solr resource configuration:

  - type: "vector-database"
    name: "SolrDataSource"
    configuration:
      service: "solr"
      user: "${secrets.solr.username}"
      password: "${secrets.solr.password}"
      host: "${secrets.solr.host}"
      port: "${secrets.solr.port}"
      collection-name: "documents"

Solr assets:
We have now the "solr-collection" asset that allows you to handle a Solr collection.

This is an example:

assets:
  - name: "documents-table"
    asset-type: "solr-collection"
    creation-mode: create-if-not-exists
    deletion-mode: delete
    config:
      collection-name: "documents"
      datasource: "SolrDataSource"
      create-statements:
        - api: "/api/collections"
          method: "POST"
          body: |
            {
              "name": "documents",
              "numShards": 1,
              "replicationFactor": 1
             }
        - "api": "/schema"
          "body": |
            {
             "add-field-type" : {
                   "name": "knn_vector",
                   "class": "solr.DenseVectorField",
                   "vectorDimension": "1536",
                   "similarityFunction": "cosine"
              }
             }
        - "api": "/schema"
          "body": |
            {
              "add-field":{
                "name":"embeddings",
                "type":"knn_vector",
                "stored":true,
                "indexed":true
                }
            }
        - "api": "/schema"
          "body": |
            {
               "add-field":{
                   "name":"text",
                   "type":"string",
                   "stored":true,
                   "indexed":false,
                   "multiValued": false
               }
            }

Querying Solr

In order to query Solr you use the "query-vector-db" agent as usual, and you can pass the parameters this way:

  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "SolrDataSource"
      query: |
        {
          "q": "{!knn f=embeddings topK=10}?"
        }
      fields:
        - "fn:toListOfFloat(value.question_embeddings)"
      output-field: "value.related_documents"

Writing to Solr

You can use the "vector-db-sink" to write to Solr.
This is an example:

  - name: "Write to Solr"
    type: "vector-db-sink"
    input: chunks-topic
    configuration:
      datasource: "SolrDataSource"
      collection-name: "documents"
      fields:
        - name: "id"
          expression: "fn:concat(value.filename, value.chunk_id)"
        - name: "embeddings"
          expression: "fn:toListOfFloat(value.embeddings_vector)"
        - name: "text"
          expression: "value.text"

@eolivelli eolivelli marked this pull request as ready for review October 11, 2023 19:50
@eolivelli
Copy link
Member Author

@alessandrobenedetti @dsmiley you may be interested in taking a look to/review how we are using the Solr client and APIs.

@eolivelli eolivelli merged commit a0c8234 into main Oct 12, 2023
9 checks passed
@eolivelli eolivelli deleted the impl/solr branch October 12, 2023 00:06
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant