-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better sorting of completions #1165
Conversation
Wow, you weren't joking about the tests. From the English description, the algorithm sounds great. I may have time to experiment with this on the weekend. |
Yeah, that's why I wanted to make sure the algorithm would be accepted before working on the tests. From my experiments with this new sorting, I've found that it has always put what I consider the most relevant results at the top. The funny thing was that the find_matches(...) function was already storing all 3 of the parameters I used for sorting, and then ignoring everything but the item text at the end when creating the completions generator. That's what gave me the idea for the sorting algorithm. |
Let's let the PR hang around for some days and see if it attracts any objections. |
Ok. Maybe you'll have some time to experiment with it as well. I'm hoping it will attract some "likes"... |
Should the proposed change only apply when |
mycli/sqlcompleter.py
Outdated
|
||
def sorted_completions(matches): | ||
# sort by match point, then match length, then item text | ||
matches = sorted(matches, key=lambda m: (m[1], m[0], m[2].lower())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the sort is casefolding it might as well also do
m[2].lower().strip('`')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can certainly add that. Pretty sure I've seen someone else complain in another issue about the back tick messing with sorting.
I don't think so, I personally would rather have a completion at the top that best matches my search text regardless of the category. Have to tried it out yet? Maybe you will agree after some experimenting. I actually first tried doing the same sort in the find_matches(...) function, so per category, and the results were not ideal. |
I'm trying it now. It would seem that the config file lies when it claims
because when set to It is rather helpful to see the completions in the "smart" ordering when nothing has yet been typed, like maybe
|
Regarding the
and then you should see table and column names. It doesn't work if you use a DSN to select the db. By the way, I think it would be really useful to do fuzzy matching when not doing smart completion. Maybe this can be implemented as a setting? |
Why closed? |
Sorry, didn't realize I did that. I deleted a comment and maybe that closed it? How can I reopen? |
Accidentally closed. |
Your suggestion works! But we really need to fix that bug.
Yes, though one issue is that I adore settings, and @amjith really does not. For example for the entire functionality in this PR, I think it would be fantastic if
or any other property name that makes sense.
Right. I personally would also prefer to use your algorithm. But we have to think of the most general usecase and the default expectation. That is why I proposed to enable it outside of the |
Let me catch up on the PR. I'm not opposed to adding a config option but if the new completion behavior feels logical we can switch to it by default. Let me read through the proposal and try it out locally. |
I think this new sort is much better than the current default, so it seems like most people would prefer it. By the way, I just upgraded to the latest pip version of 1.27.2, and the sorting is now much worse than before. Now it appears to be returning results in alphabetical order, where as before the order was more by relevance (previous version I used was 1.26.1). So now in 1.27.2, when I do this: Also, there are other open issues I've seen where people complain about the fuzzy matching and want an option to turn it off because the fuzzy matches are often at the top of the results. My algorithm will also fix this since the fuzzy matches will always be below exact matches. |
I'm still testing things out. Notes so far, I like that |
Ok, I see the issue. The find_files() method is returning a Completion generator and the code is expecting an array of match tuples. Any idea what happened to the sorting in the latest version (1.27.1) ? Why is it now alphabetical? |
Still waiting on feedback for this one... |
I've tested it locally and I'm fine with the ordering. Happy to merge if you can fix the tests. |
Cool, I'll work on fixing the tests. |
Is there a way to run tests individually? I'm having a bit of trouble understanding the test setup. |
We just use For example:
You can also specify partial names to select multiple tests.
|
Cool, thanks! I don't use python that much, so this is all pretty new to me. First test fixed... |
I've fixed all of the unit tests, but when I pushed my changes, I get an email saying the "PR run failed ...". How do I fix this? |
Looks like the error is related to the test command, which is not directly related to this pull request. "The test command is disabled and references to it are deprecated." What's the status of this pull request? |
What happens if you change the line |
Better, but why are most of these checks getting cancelled? Also, the one that failed had no errors: 8 features passed, 0 failed, 0 skipped |
First pytest is run, then behave is run. Then the CI run exits with a failure if either one failed. In this case, pytest is showing failures:
I believe the other CI runs in the matrix were canceled as soon as one failed. |
I realized that the reason one of the tests is failing is because my sort algorithm is comparing lower cased matches, which doesn't work for the special commands starting with I feel like there shouldn't even be empty string completion, especially since it isn't really empty string, but rather blank string and requires entering something like a space to trigger it. It's also strange that when I run |
Still waiting to hear back on how to handle this one failing unit test. Doesn't seem worth complicating the sort method just to support these special commands, and fix a single unit test. |
Pretty disappointed not to get any feedback on this, I spent a lot of time trying to fix all these unit tests and I think it's really close. I wanted to contribute to this project, but I guess I can just use my own branch. |
Sorry @schmeic. Life got ridiculous for the past few months. I'm taking a look now. |
def sorted_completions(matches): | ||
# sort by match point, then match length, then item text | ||
matches = sorted(list(matches), key=lambda m: | ||
(m[1], m[0], m[2].lower().strip('`'), m[2].startswith('`'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this strip
is causing the tests to fail. The backticks are used to escape the table name if it happens to be a keyword. For instance if someone names the table table
then it going to conflict with the keyword since table
is a keyword. This adds backticks around it to escape the string to avoid the conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment by @rolandwalker on June 13. The idea is that a table/field surrounded by backticks should match the same one without backticks. I've tried the change the tests to work with this.
@schmeic I went down a rabbit hole here. The reason we don't sort the completion candidates is because we have the keywords listed in a curated order that is based on popularity instead of lexicography. But I went back to your original motivation for this PR. When you type I'll open a new PR and tag you on it so you could take it for a spin and let me know what you think. |
The FROM coming after FROM_UNIXTIMESTAMP was just one example of what I thought was wrong with the completions. Why should columns be at the top? There are plenty of times when I'm looking for a keyword, and don't want to see all those unsorted column names at the top. I think the completions are a lot more intuitive when everything is sorted and works the same. |
I understand that. That's why I said my PR is a superficial fix.
Imagine this scenario: I think the current implementation is a more human centric approach with UX. In fact, the keywords are ordered (I think I mentioned this in my previous comment as well) to make sure the most common keywords are listed at the top. This is also meant to facilitate a user to pick things from the completion menu rather than continue to type to get to their desired keyword. My goal was to ensure we can provide a user friendly UX for people who pick from a menu without compromising the efficiency of someone who likes to type (or knows the schema full well). But you pointed out a very valid flaw with the FROM and FROM_UNIXTIMESTAMP that was annoying for a user who prefers to type rather than use arrows. So I decided to fix that tactically. I hope my counterpoint convinces you of the reason for sticking with the current approach. As always I'm open to revisiting the design if there are compelling reasons to do so. |
Ok, I guess I see your point about "discovering the schema", but I would argue that if you want to do that, use: Once you start typing any characters, the current approach will typically provide worse results than my approach, especially when you want a keyword instead of a column. So, I still feel like my approach is better, since it will provide more relevant results the majority of the time. It seems pretty inefficient to try and discover the schema using autocomplete. As far as your point about keyword ordering, they may be listed with more common ones at the top, but what if I'm looking for a less common one? My approach will find it much more efficiently. However, I think the keyword search should be doing fuzzy searching when using my approach. Currently, when I search for |
|
I've updated #1175 to enable fuzzy matching for keywords. Can you give it a try? You can install it as follows:
|
Sounds like you prefer good discovery over relevant results when text is typed. I want to find the completion I'm looking for with the fewest keystrokes. I'm not sure if it's possible to have both, let me know if you have any ideas on how to make it work. I actually added fuzzy keyword matching on my branch, it works better. |
I'm not doing a good job of explaining myself. I value both I think we can achieve the outcome you're looking for without compromising discoverability. Let me explain it slightly differently. I have a users table with columns that have similar names When I type But what I don't want to see is I think enabling I am sorry this PR is not getting merged but I want to thank you for prompting me to do better with the UX. |
Just to be clear, the root of my complaint was not listing FROM_UNIXTIME before FROM, that was just one example of what I considered to be undesirable completion results. I want to move the completion term that I'm looking for to the top with the minimum amount of keystrokes, and prioritizing column names over other completions hinders that. I wish we could have had this discussion before you told me to fix all the unit tests and you would accept the pull request. I wasted a lot of time working on them. |
Description
Sort completions by match point, then match length, then item text.
Note that match length is equal to the length of the search text for non fuzzy matches, and equal to the length of the match group for fuzzy matches.
This means that matches that start with the search text will be displayed first, followed by matches that contain the search text, and then fuzzy matches.
Sorting by item text when both match point and match length are the same for matched completions ensures that completions that start with the search text will be in alphabetical order. For example, searching for fr will display FROM before FROM_UNIXTIME.
Checklist
changelog.md
.AUTHORS
file (or it's already there).