-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search with Context Similarity #2
Search with Context Similarity #2
Conversation
QA: Question Answering based on Retrieval Models Used:
|
This aligns with my expectations. Claude is really good at dealing with fine grained comprehension and working with code. I use it as my primary model over ChatGPT inside of my cursor IDE.
I haven't done extensive testing regarding context lengths but I generally use o1 for higher level more complex tasks. For example the most recent interesting use was when I bootstrapped both the "sync-configs-agent" tool as well as the "rpc-handler" tool in my github org. I give a detailed prompt using my voice to provide context, and then after I'll paste in context that's relevant. I'll have o1 preview do it's thing and be like 80-90% to completion. I use Claude for the remainder. I also know that mini has a larger usable context window but may be less capable than preview. So I would use it for "in between" tasks which I can pump in a ton of context from an already made codebase but still have large ish sweeping changes recommended. References: |
Hows this coming along? |
I'm currently trying to adjust the prompt to include the verbose parameter (v = 1). I've experimented with various prompting techniques, including words like "brevity," which typically help in reducing verbosity. However, none of these approaches seem to be effective with sonnet. The output either becomes too unimaginative, lacking creativity in using resources and context, or it fails to cut off entirely. The current version works well. I can merge it into main and think of a better prompt later. |
Okay merge and lets test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a pretty solid implementation
src/types/plugin-inputs.ts
Outdated
export const pluginSettingsSchema = T.Object({ | ||
model: T.String({ default: "o1-mini" }), | ||
openAiBaseUrl: T.Optional(T.String()), | ||
similarityThreshold: T.Number({ default: 0.1 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain to me what the similarity threshold is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarity levels for the similarity search with issues and comments range from 0 to 1 (Unit Normalized), where 0 indicates the best match and 1 represents the farthest or worst match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from clarifying that this is calculating the difference with subtraction, so closer to 0 difference means more similar. Or to make it more intuitive, maybe reverse it.
1 is the most similar and 0 is least similar, so you want 90% similarity threshold (0.9) thats a lot more intuitive for a config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, Inverted the scale, the parameters would range from 0 to 1 instead now. If someone enters 0.9 it would mean 90% similar now.
src/types/gpt.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to llm.d.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed File
src/types/env.ts
Outdated
export const envSchema = T.Object({ | ||
OPENAI_API_KEY: T.String(), | ||
UBIQUITY_OS_APP_NAME: T.String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should default to "UbiquityOS"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the default value.
src/helpers/issue.ts
Outdated
* @returns The content of the README file as a string. | ||
*/ | ||
export async function pullReadmeFromRepoForIssue(params: FetchParams): Promise<string | undefined> { | ||
let readme = undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let readme = undefined; | |
let readme; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed Initialization to undefined.
src/handlers/ask-gpt.ts
Outdated
model, | ||
rerankedText, | ||
formattedChat, | ||
["typescript", "github", "cloudflare worker", "actions", "jest", "supabase", "openai"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Handling Ground Truths: They are indicating that the system uses “ground truths” — meaning predefined correct examples or comments that the system relies on for determining context. Even if the query (or comment) doesn’t provide enough context, the system tries not to make assumptions. For example, if the query asks about “types” in a code snippet without specifying a language, the system shouldn’t assume it’s referring to Python.
Hard coding these things is the wrong approach then. This needs to be dynamic in a new task.
src/handlers/ask-gpt.ts
Outdated
if (!text) { | ||
return ""; | ||
} | ||
return text.replace(/[^a-zA-Z0-9\s]/g, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You sure you want to remove formatting clues such as bullet point lists, and the syntax for images? You'll just be left with URLs
You're also removing the block quote indicator which certainly changes the meaning of the corpus (quoting somebody else doesn't mean you agree.)
This seems like the regex needs to be a lot more comprehensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this because it was only used with issues and comments. The goal was to eliminate just the emojis, as I thought they caused the LLM to produce strange Unicode values in the results. I believe the newer models don’t have this issue either, so it’s unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You marked my comments as "resolved" but didn't implement the requested changes.
Could you please clarify which changes I may have overlooked? |
if (answer && answer.content && res.usage) { | ||
return { answer: answer.content, tokenUsage: { input: res.usage.prompt_tokens, output: res.usage.completion_tokens, total: res.usage.total_tokens } }; | ||
} | ||
return { answer: "", tokenUsage: { input: 0, output: 0, total: 0 } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an empty string always seems like a bad idea. This seems to make more sense to throw an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It throws an error at the UI level, displaying the message No answer from OpenAI
. Sample
export interface CommentType { | ||
id: string; | ||
plaintext: string; | ||
markdown?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional seems wrong unless its an optimization to save tokens
query_text: query, | ||
query_embedding: embedding, | ||
threshold: threshold, | ||
max_results: 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ten optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are ten issues and ten comments. Voyage AI performs excellently in this regard, consistently providing relevant issues. I believe ten is sufficient given the extensive local context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are ten issues and ten comments. Voyage AI performs excellently in this regard, consistently providing relevant issues. I believe ten is sufficient given the extensive local context.
} | ||
|
||
/** | ||
* Asks GPT a question and returns the completions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to find and replace all GPT instances in the code base with LLM
model, | ||
rerankedText, | ||
formattedChat, | ||
["typescript", "github", "cloudflare worker", "actions", "jest", "supabase", "openai"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we haven't already: we should make another task for dynamic ground truths
const links: string[] = []; | ||
inputString = inputString.replace(/https?:\/\/\S+/g, (match) => { | ||
links.push(match); | ||
return `__LINK${links.length - 1}__`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong but i dont know the full context of how its used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It removes duplicate sentences and phrases from the context and works reasonably well, fitting nearly ~250K of context in o1-mini. However, a downside is the loss of context regarding references, as it retains links and only some punctuation.
* @param params - The parameters required to fetch the README, including the context with octokit instance. | ||
* @returns The content of the README file as a string. | ||
*/ | ||
export async function pullReadmeFromRepoForIssue(params: FetchParams): Promise<string | undefined> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed feelings on this. They fall out of date so fast. Its useful reference but might be worth warning the LLM that theres a good chance that it is out of date information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They typically offer useful context about a repository, even if it's slightly outdated. This information can help users with their queries and provide some setup guidance.
import { Context } from "./types"; | ||
import { askQuestion } from "./handlers/ask-llm"; | ||
import { addCommentToIssue } from "./handlers/add-comment"; | ||
import { LogLevel, LogReturn, Logs } from "@ubiquity-dao/ubiquibot-logger"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this package still work? I thought we deleted it and rebranded to something like
@ubiquity-os/ubiquity-os-logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was installed before the purge. Will update this to the new logger version.
My last batch of comments is intended to be handled async because the pull is good enough to test in beta with |
@0x4007 This is the config I used
Locally the env file (.dev.vars) has to be configured with:
I can deploy it on the workers using my credentials if necessary. |
@sshivaditya2019 For the Supabase, does it need a brand new instance or does it share with https://github.com/ubiquity-os-marketplace/text-vector-embeddings/ ? Also the deployment script does not upload the |
It utilizes the same database as |
@sshivaditya2019 Sounds good, please poke me in telegram ( |
Resolves #50
@ubiquityos
gpt command #1Results for Database fetching backfilling: