Inconsistent Embedding Dimensions in Configuration and Code Usage #5412
Replies: 5 comments 1 reply
-
👀 @jaavid Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. |
Beta Was this translation helpful? Give feedback.
-
Hey @jaavid! I'm here to help you with any bugs, questions, or contributions you need assistance with. Let's work together to solve this issue! Yes, an update is needed to resolve the inconsistency between the declared dimensions in the project configuration and the actual embedding model in use. The To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
Hi @dosubot, thank you for your prompt response! You’ve pinpointed the issue well regarding the dimensional mismatch between the configuration and the OpenAI model. Here’s a summary of what I believe might help: Database Schema: The column in the public.embeddings table should be updated to vector(1536) instead of vector(1024). Are there any backward compatibility concerns here that I should be aware of when updating the schema? Configuration Files: The embedding configuration file (embedding.ts) seems to hardcode the dimensionality. I noticed references to test fixtures and sample input validations. Should I manually adjust the dimensions, or do you have a recommended way to handle changes dynamically (perhaps via environment variables or config files)? Possible PR: I’m happy to submit a pull request to help with this issue! If there are contributing guidelines specific to schema migrations or testing for embeddings, please point me in the right direction. Thanks again for the support! Looking forward to working on this further. |
Beta Was this translation helpful? Give feedback.
-
Updating the database schema to accommodate the new vector dimensions can indeed raise backward compatibility concerns. When you change the schema from Regarding the configuration files, if the dimensionality is hardcoded in For submitting a pull request, while I couldn't find specific contributing guidelines related to schema migrations or testing for embeddings in the repository, it's generally a good practice to include tests that validate the new schema and ensure that the embeddings are correctly handled with the updated dimensions. You might also want to document any changes you make to the configuration process. If you decide to proceed with a pull request, make sure to include detailed documentation of your changes and any necessary migration steps for users. This will help maintainers and other contributors understand and integrate your changes more easily. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
@jaavid openai text-embedding-3-small support custom dimensions under 1536. as for their post, it's a good balance use 1024 dims |
Beta Was this translation helpful? Give feedback.
-
📦 Environment
Docker
📌 Version
1.45.6
💻 Operating System
Ubuntu
🌐 Browser
Chrome
🐛 Bug Description
There is an inconsistency between the declared dimensions in the project configuration and the actual embedding model in use. According to OpenAI's documentation, the text-embedding-3-small model outputs vectors of 1536 dimensions. However, the embedding-related code and database schema are set to handle 1024 dimensions, which leads to insertion errors in the embeddings table.
📷 Recurrence Steps
No response
🚦 Expected Behavior
No response
📝 Additional Information
need update?
Beta Was this translation helpful? Give feedback.
All reactions