-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull/Task Matching #64
Comments
Note The following contributors may be suitable for this task: sshivaditya2019
0x4007
|
@sshivaditya2019 is there an opportunity to use embeddings here? Intuitively I feel probably not because code diff is very different from issue specifications but I do like that we are able to quantify a confidence threshold using embeddings easily. It seems like it could be costly to review the diff on every push and read all the unassigned specs and try to guess every time. I wonder if there is a cache strategy we can design somehow. I think we might also be able to get by with a mini model because I imagine that there isn't too much context to worry about. |
Processing code diff embeddings can be quite challenging and isn't directly comparable to the text embeddings produced by other models. Instead, we could aggregate relevant information like the PR title, description, changed files, modified functions, and commit messages into a structured template. This template could then be used to generate embeddings, which can be searched against the existing issue database. Rather than automatically linking the PR to an issue, we could present the user with a list of potential issues that the changes may address.
I think this approach would be much more efficient/economical, that directly comparing diffs across all the PRs and issues. |
Lets give it a shot |
If there's unassigned tasks (priced issues) and a pull is opened against a specific repo, I imagine that by looking at the diff we might be able to reasonably guess what task the pull is associated with.
This will require some research to get the accuracy high enough for this to be useful but I'm fairly optimistic because it will not have so many task specifications to guess against.
Once this is quite accurate, then it will be a very seamless experience for contributors to just focus on opening pulls and UbiquityOS handles the rest including all the necessary payouts etc.
I think that the configuration can have some type of confidence threshold I.e. 50% before it will try and associate. This will prevent newly opened pulls with barely any changes to try and match unrelated results.
After some more commits, it can start adding a comment under the pull (not sure if this requires the contributor to have installed UbiquityOS on their repo?) explaining that the pull is likely to be associated with X issue (this also should only run if it is not already manually linked to an issue)
The text was updated successfully, but these errors were encountered: