Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull/Task Matching #64

Open
0x4007 opened this issue Dec 31, 2024 · 4 comments
Open

Pull/Task Matching #64

0x4007 opened this issue Dec 31, 2024 · 4 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Dec 31, 2024

If there's unassigned tasks (priced issues) and a pull is opened against a specific repo, I imagine that by looking at the diff we might be able to reasonably guess what task the pull is associated with.

This will require some research to get the accuracy high enough for this to be useful but I'm fairly optimistic because it will not have so many task specifications to guess against.

Once this is quite accurate, then it will be a very seamless experience for contributors to just focus on opening pulls and UbiquityOS handles the rest including all the necessary payouts etc.

I think that the configuration can have some type of confidence threshold I.e. 50% before it will try and associate. This will prevent newly opened pulls with barely any changes to try and match unrelated results.

After some more commits, it can start adding a comment under the pull (not sure if this requires the contributor to have installed UbiquityOS on their repo?) explaining that the pull is likely to be associated with X issue (this also should only run if it is not already manually linked to an issue)

Copy link

ubiquity-os-beta bot commented Dec 31, 2024

Note

The following contributors may be suitable for this task:

sshivaditya2019

83% Match ubiquity-os-marketplace/text-vector-embeddings#7

0x4007

75% Match ubiquity/ubiquibot#917

@0x4007
Copy link
Member Author

0x4007 commented Dec 31, 2024

@sshivaditya2019 is there an opportunity to use embeddings here? Intuitively I feel probably not because code diff is very different from issue specifications but I do like that we are able to quantify a confidence threshold using embeddings easily.

It seems like it could be costly to review the diff on every push and read all the unassigned specs and try to guess every time. I wonder if there is a cache strategy we can design somehow.

I think we might also be able to get by with a mini model because I imagine that there isn't too much context to worry about.

@shiv810
Copy link

shiv810 commented Jan 2, 2025

@sshivaditya2019 is there an opportunity to use embeddings here? Intuitively I feel probably not because code diff is very different from issue specifications but I do like that we are able to quantify a confidence threshold using embeddings easily.

Processing code diff embeddings can be quite challenging and isn't directly comparable to the text embeddings produced by other models. Instead, we could aggregate relevant information like the PR title, description, changed files, modified functions, and commit messages into a structured template. This template could then be used to generate embeddings, which can be searched against the existing issue database. Rather than automatically linking the PR to an issue, we could present the user with a list of potential issues that the changes may address.

PR Title: "Fix dark mode in PostAuthService"
PR Description: "..."
Changed Files: ["src/auth/postAuthService.js", "test/auth_test.js"]
Changed Functions: ["isDarkMode", "setDarkMode"]
Commit Messages: ["dark mode state handling", "state not being passed to the child"]

I think this approach would be much more efficient/economical, that directly comparing diffs across all the PRs and issues.

@0x4007
Copy link
Member Author

0x4007 commented Jan 2, 2025

I think this approach would be much more efficient/economical, that directly comparing diffs across all the PRs and issues.

Lets give it a shot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants