Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similar Issues Adjustments #25

Closed
0x4007 opened this issue Oct 1, 2024 · 22 comments · Fixed by #26
Closed

Similar Issues Adjustments #25

0x4007 opened this issue Oct 1, 2024 · 22 comments · Fixed by #26

Comments

@0x4007
Copy link
Member

0x4007 commented Oct 1, 2024

          > This issue seems to be similar to the following issue(s):

This ran extremely fast.

@sshivaditya2019:

  1. can you make sure that this includes the www. prefix so it does not "reference" or "link back" to the similar issue? I'm also curious why it thinks its a similar issue.
  2. Also I think it should probably only check within the same repo, instead of the entire organization.
  3. Rounding the percentage seems useful. It doesn't matter to us below a fraction.
  4. Style adjustments:


Originally posted by @0x4007 in ubiquity-os/plugins-wishlist#52 (comment)

Copy link
Contributor

ubiquity-os bot commented Oct 1, 2024

This issue seems to be similar to the following issue(s):

@0x4007
Copy link
Member Author

0x4007 commented Oct 1, 2024

So this should be within repo only, it appears to be network wide which is not very useful. The above comment is becoming recursive.

@shiv810
Copy link
Collaborator

shiv810 commented Oct 1, 2024

@0x4007 One possible reason it's stacking could be a low match threshold as almost all of them are between 75 to 80. Increasing the threshold to 85 might resolve this. I checked the embeddings for the issues, and they are indeed similar. We could use N-Grams for text matching between similar issues, which might help fix this Recursion problem.

@0x4007
Copy link
Member Author

0x4007 commented Oct 1, 2024

a low match threshold as almost all of them are between 75 to 80.

I think the implementation needs adjustment because in the context of any idea ever, perhaps the issues are alike but within the context of a repo they are quite different. Maybe we should use log function.

Applying a logarithmic function could be useful in this context to adjust the similarity scores, especially if you're trying to emphasize differences at certain ranges. Here's why:

  1. Score Distribution: If most of your similarity scores cluster between 75 and 80, applying a log function could spread out these values more evenly. This makes subtle differences more distinguishable, helping you avoid stacking similar issues that aren't truly alike.

  2. Context Sensitivity: Logarithmic scaling can help distinguish finer details when values are high, which aligns with your concern that issues might be similar in general but differ in specific contexts within the repo.

You might experiment with different bases (e.g., natural log or log base 10) to see how it impacts your thresholds and similarity evaluation. Applying this transformation before filtering or adjusting the threshold could yield a more nuanced matching process.

@shiv810
Copy link
Collaborator

shiv810 commented Oct 1, 2024

a low match threshold as almost all of them are between 75 to 80.

I think the implementation needs adjustment because in the context of any idea ever, perhaps the issues are alike but within the context of a repo they are quite different. Maybe we should use log function.

Applying a logarithmic function could be useful in this context to adjust the similarity scores, especially if you're trying to emphasize differences at certain ranges. Here's why:

  1. Score Distribution: If most of your similarity scores cluster between 75 and 80, applying a log function could spread out these values more evenly. This makes subtle differences more distinguishable, helping you avoid stacking similar issues that aren't truly alike.
  2. Context Sensitivity: Logarithmic scaling can help distinguish finer details when values are high, which aligns with your concern that issues might be similar in general but differ in specific contexts within the repo.

You might experiment with different bases (e.g., natural log or log base 10) to see how it impacts your thresholds and similarity evaluation. Applying this transformation before filtering or adjusting the threshold could yield a more nuanced matching process.

I have implemented cosine similarity followed by edit distance ranking now. Haven't done QA yet.

Copy link
Contributor

ubiquity-os bot commented Oct 1, 2024

! Fetching all pull requests failed!

1 similar comment
Copy link
Contributor

ubiquity-os bot commented Oct 1, 2024

! Fetching all pull requests failed!

@shiv810
Copy link
Collaborator

shiv810 commented Oct 1, 2024

/start

1 similar comment
@gentlementlegen
Copy link
Member

/start

Copy link
Contributor

ubiquity-os bot commented Oct 2, 2024

! You have reached your max task limit. Please close out some tasks before assigning new ones.

@gentlementlegen
Copy link
Member

gentlementlegen commented Oct 2, 2024

So the fact that /start didn't respond is due to ubiquity-os/ubiquity-os-kernel#120 however the failure to retrieve pull-requests is unknown, we should have a look.


In response to this I enabled observability on Cloudflare and improved the logs, let's see if that happens again.

Copy link
Contributor

ubiquity-os bot commented Oct 2, 2024

@sshivaditya2019 the deadline is at Wed, Oct 2, 3:52 AM UTC

@0x4007
Copy link
Member Author

0x4007 commented Oct 2, 2024

@gentlementlegen It is also possible that KV is hitting its limits given all the recent warnings. In this case I guess that the requests would be rate limited 429

@gentlementlegen
Copy link
Member

It's a GitHub API request so it is possible that the API got rate limited as well but less likely because we are authenticated as far as I know.

@0x4007
Copy link
Member Author

0x4007 commented Oct 2, 2024

Had a good result here which is aligned with my vision. I am thinking to adjust the UI/UX again.

Let's edit and append to the issue specification so that the UI is minimal. Below is the source code and at the bottom is the rendered form. Notice that for the footnote I prefix the numbers all with 0 so they don't collide with normal use. As it turns out, GitHub stringifies them and then uses them as a unique key, so it actually doesn't matter what we use for the ID (numbers, letters, emoji) as long as they match in the footnote.

###### Similar [^01^]

[^01^]: [Near Instant GitHub Actions Cold Boot Times](https://github.com/ubiquity-os/plugin-template/issues/24) 77%
Similar 1

Footnotes

  1. Near Instant GitHub Actions Cold Boot Times 77%

@shiv810
Copy link
Collaborator

shiv810 commented Oct 2, 2024

Had a good result here which is aligned with my vision. I am thinking to adjust the UI/UX again.

Let's edit and append to the issue specification so that the UI is minimal. Below is the source code and at the bottom is the rendered form. Notice that for the footnote I prefix the numbers all with 0 so they don't collide with normal use. As it turns out, GitHub stringifies them and then uses them as a unique key, so it actually doesn't matter what we use for the ID (numbers, letters, emoji) as long as they match in the footnote.

I think this a older version of the plugin. The newer plugin has an updated ui.

@0x4007
Copy link
Member Author

0x4007 commented Oct 2, 2024

I know, I'm saying that we should actually do it differently. We should append this to the specification because it will look a lot cleaner.

@0x4007 0x4007 closed this as completed in #26 Oct 3, 2024
Copy link
Contributor

ubiquity-os bot commented Oct 3, 2024

 [ 78.9625 WXDAI ] 

@sshivaditya2019
Contributions Overview
ViewContributionCountReward
IssueTask175
IssueComment33.9625
ReviewComment60
Conversation Incentives
CommentFormattingRelevanceReward
@0x4007 One possible reason it's stacking could be a low match t…
3.29
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 61
  wordValue: 0.1
  result: 3.29
0.852.7965
I have implemented cosine similarity followed by edit distance r…
1.06
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 16
  wordValue: 0.1
  result: 1.06
0.70.742
I think this a older version of the plugin. The newer plugin has…
1.06
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 16
  wordValue: 0.1
  result: 1.06
0.40.424
Resolves #25 - Developed a new similarity search function, use…
1.5
content:
  content:
    p:
      score: 0
      elementCount: 1
    ul:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 1
  result: 1.5
regex:
  wordCount: 14
  wordValue: 0
  result: 0
0.7-
QA Using Euclidean DistanceUsing 90% similarity for match and …
20.37
content:
  content:
    p:
      score: 0
      elementCount: 5
    ul:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 3
    a:
      score: 5
      elementCount: 3
  result: 17.5
regex:
  wordCount: 23
  wordValue: 0.2
  result: 2.87
0.6-
It represents the straight-line distance between any two points …
11.29
content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 115
  wordValue: 0.2
  result: 11.29
0.5-
I experimented with Manhattan distance (also known as Taxi Cab D…
13.2
content:
  content:
    p:
      score: 0
      elementCount: 2
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 79
  wordValue: 0.2
  result: 8.2
0.5-
Updated the new UI for similar issues message. It would not crea…
8.7
content:
  content:
    p:
      score: 0
      elementCount: 2
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 31
  wordValue: 0.2
  result: 3.7
0.8-
Sorry, I meant to say that it wouldn’t create a comment; instead…
2.66
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 21
  wordValue: 0.2
  result: 2.66
0.3-

 [ 68.5665 WXDAI ] 

@0x4007
Contributions Overview
ViewContributionCountReward
IssueSpecification122.23
IssueComment537.5955
ReviewComment108.741
Conversation Incentives
CommentFormattingRelevanceReward
> This issue seems to be similar to the following issue(s):…
7.41
content:
  content:
    p:
      score: 0
      elementCount: 7
    ol:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 4
    hr:
      score: 0
      elementCount: 2
    em:
      score: 0
      elementCount: 1
  result: 3
regex:
  wordCount: 86
  wordValue: 0.1
  result: 4.41
122.23
So this should be within repo only, it appears to be network wid…
2.98
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 24
  wordValue: 0.2
  result: 2.98
0.852.533
I think the implementation needs adjustment because in the conte…
4.21
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 36
  wordValue: 0.2
  result: 4.21
0.853.5785
@gentlementlegen It is also possible that KV is hitting its limi…
3.4
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 28
  wordValue: 0.2
  result: 3.4
0.72.38
Had a good [result here](https://github.com/ubiquity-os/plugin-t…
22.37
content:
  content:
    p:
      score: 0
      elementCount: 3
    a:
      score: 5
      elementCount: 2
    pre:
      score: 0
      elementCount: 1
    h6:
      score: 1
      elementCount: 1
  result: 11
regex:
  wordCount: 116
  wordValue: 0.2
  result: 11.37
0.921.233
I know, I'm saying that we should actually do it differently. We…
8.19
content:
  content:
    p:
      score: 0
      elementCount: 1
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 26
  wordValue: 0.2
  result: 3.19
0.97.871
```suggestionconst modifiedUrl = issue.node.…
0.25
content:
  content:
    pre:
      score: 0
      elementCount: 1
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 3
  wordValue: 0.1
  result: 0.25
0.90.225
```suggestionconst body = "\n###### Similar " + …
0
content:
  content:
    pre:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 0
  wordValue: 0.1
  result: 0
0.8-
Never shorten words in identifiers to reduce cognitive overhead.…
1.06
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 16
  wordValue: 0.1
  result: 1.06
0.70.742
```suggestionconst footnoteLinks = [...Array(++f…
0
content:
  content:
    pre:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 0
  wordValue: 0.1
  result: 0
0.8-
```suggestion```
0
content:
  content:
    pre:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 0
  wordValue: 0.1
  result: 0
0.4-
```suggestionlet finalIndex = 0;```
0
content:
  content:
    pre:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 0
  wordValue: 0.1
  result: 0
0.6-
Why did you use Euclidean Distance?I don't have a lot of deep …
8.29
content:
  content:
    p:
      score: 0
      elementCount: 4
    a:
      score: 5
      elementCount: 1
    hr:
      score: 0
      elementCount: 1
  result: 5
regex:
  wordCount: 61
  wordValue: 0.1
  result: 3.29
0.66.974
4o:
0.1
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 1
  wordValue: 0.1
  result: 0.1
0.10.01
What does that mean? I can edit anybody's specification with col…
1.59
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 26
  wordValue: 0.1
  result: 1.59
0.40.636
I made changes here but why is there no build CI?
0.77
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 11
  wordValue: 0.1
  result: 0.77
0.20.154

 [ 3.596 WXDAI ] 

@gentlementlegen
Contributions Overview
ViewContributionCountReward
IssueComment23.596
Conversation Incentives
CommentFormattingRelevanceReward
So the fact that `/start` didn't respond is due to https…
2.92
content:
  content:
    p:
      score: 0
      elementCount: 2
    hr:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 53
  wordValue: 0.1
  result: 2.92
0.82.336
It's a GitHub API request so it is possible that the API got rat…
1.8
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 30
  wordValue: 0.1
  result: 1.8
0.71.26

@0x4007
Copy link
Member Author

0x4007 commented Oct 3, 2024

Ah this still is using the bad conversation rewards algorithm. It would be nice to push the new algorithm to main @gentlementlegen

@gentlementlegen
Copy link
Member

@0x4007 it seems to use the updated version? What is wrong in it

@0x4007
Copy link
Member Author

0x4007 commented Oct 3, 2024

#25 (comment) caught my eye as being too high but I'm reviewing the statistics:

content:
  content: # strange there's content.content
    p:
      score: 0
      elementCount: 3
    a:
      score: 5 # looks like its counting the footnotes as links. I only see three related to the footnotes (these should be hardcoded to be removed.) but there is one unaccounted for that I cant find?
      elementCount: 2
    pre:
      score: 0
      elementCount: 1
    h6:
      score: 1
      elementCount: 1
  result: 11

regex:
  wordCount: 116
  wordValue: 0.2
  result: 11.37

I think we have an unaddressed scenario of dealing with footnotes. So we'll need to make a new task then. I also have a feeling that it is counting words within the code block, we should not have this. This should be indicated in the analytics overview if a tag words are being ignored. Also its strange to me that it parsed it as pre instead of code perhaps its because I didn't include the syntax highlighting header?

Regarding config, I think wordCount should probably default to 0.1 including for author, not sure why its 0.2!

  1. Ignore links related to footnotes
  2. Do not include footnotes in word count credit

@0x4007
Copy link
Member Author

0x4007 commented Oct 4, 2024

https://github.com/Dicklesworthstone/fast_vector_similarity some interesting algorithms that are relevant @sshivaditya2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants