-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve configurability of embedding and LLM model sources #169
Comments
what is different between a project and a table? |
I was using "project" and "job_name" (the parameter in vectorize.table()) interchangeably. Maybe we should rename it to "project". A table can have multiple of the "jobs". Tables have column(s) that get transformed into embeddings using the
this is currently possible and we'd want to preserve it going forward. Currently the model provider is determined by the name of the model passed into the function call, and it works the same for embeddings and LLM. For example,
I think we could do something like the below for multiple language models. vectorize.table(
table => 'mytable',
job_name => 'project_persian',
transformer => 'ollama/persian-model'
);
vectorize.table(
table => 'mytable',
job_name => 'project_arabic',
transformer => 'ollama/arabic-model'
);
Images are not yet supported but there are plans to implement it soon. |
I suggest building a simple secret manager using Postgres with tools like pgcrypto or pgsodium Here’s the plan:
The table structure would store user-level secrets, so it doesn’t have to be a super-user table. This way, each user can securely store and manage their own API keys. Also, since transformer and chat_model are similar (they’re the same model but respond to different requests), we could set up a single table called What do you think, @ChuckHend? |
I like this. What would a row look like for OpenAI since base url and api key would be for both an LLM type and embeddings type? Whereas some others might be just embedding, or just LLM model_resource? |
we have some difficulty here, some LLMs limits their responsible embeddings with what they support, so I need to think about it. |
I think there are two ways to do this: First Way: Using compatible_models as a ColumnWe can add a
the second way is making another table for
|
Column Name | Data Type | Description |
---|---|---|
model_id |
UUID | Foreign key to model_resource (model) |
compatible_model_id |
UUID | Foreign key to model_resource (compatible) |
@tavallaie, what are some example of values that would go in the |
In my design, there is no difference between embedding models and LLMs or even images and audio. |
Ok I think I might see where you are going with that. Can you provide an example of what that table might look like in your use case? |
I am thinking of something like this:
|
Thank you. Do the APIs need to change how they reference a model then, or how does this impact vectorize.table(), vectorize.init_rag(), and others? |
I don't think we need to change them, because names are unique we can get them by name. |
Ok cool, I like this. It'll be a fairly large code change I think. For the "OpenAI compatible" providers, it will probably be more performant to keep the code that does the I think I'm on board with this overall design btw. I think some of it will end up being a fairly large code change, do you think we can break it up into a few smaller PRs? |
lets start with compatible provider like adding provider column to our model, in this way we have open AI and Ollama provider that mostly used by people and we can put our effort to work with vllm, self-hosted Ollama and LM-studio and etc. |
like this:
when creating a job automatically decide which provider should be used. |
So maybe providers and models are separate tables then? In the above, won't we end up with almost identical records since there is also 'openai/text-embedding-3-small'? |
we have providers in rust like mentioned in #152, so maybe we can change those to be compatible with this model instead of hardcoding. |
Do you have any sense for performance difference between using request/response mapping vs having hard coding? |
not really, we should run few tests. |
Issue is WIP and will be further refined.
LLM and embedding model sources are currently defined in GUC, e.g.
vectorize.openai_service_url = https://api.openai.com/v1
contains the base url for OpenAI. This implementation introduces at least two limitatations:vectorize.openai_service_url = https://api.openai.com/v1
and project_b wantsvectorize.openai_service_url = https://myapi.mydomain.com/v1
. The pg vectorize background worker reads the model source from the job's values in the vectorize.job table in order to know which guc to use.Proposal: move GUCs into a table such as
vectorize.model_sources
to contain information such as the base url, schema, etc.Considerations:
The text was updated successfully, but these errors were encountered: