-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with strokes that are letter combinations ("re", "ni", etc) #174
Comments
Here is the list using a simple parser that looks for entries that the same strokes as letters, where each stroke has
I have the changes locally to remove this from |
Thanks for highlighting the issue here @JRJurman, as well as writing it up with a potential solution. Thanks as well for the script you made to efficiently make the adjustments. Great work! I don't think you're missing much about steno here. This is definitely an issue. The only other steno factor is how the glue operator works (notes below). Otherwise, there are Typey Type-specific considerations (notes below). I think the best approach for now is to:
Other NotesI think the intention for the The main Plover dictionary itself defines things like the fingerspelled "i" character using the glue operator, for example,
Specific to Typey Type, the To ensure the glue operator works correctly to make the fingerspelled letters attach to previous letters, I think we'd need to remove fingerspelled words like That would have knock on effects. Off the top of my head, that might include:
Most of these factors are problems that should be solved in the Typey Type app code or static lesson generator, which would then make it more practical to remove these entries here. Alternatively, if we updated all of these entries to use glue operators, e.g. if
… but that would add a maintenance overhead and duplication. Is it worth the effort for handling rare edge cases? |
@JRJurman what do you think? @paulfioravanti, I'd also appreciate your take here, having worked with the dictionaries a lot and bumped into some of the edge cases, including fingerspelled entries in |
It occurred to me after writing this up that this was in part used for typey-type, and so removing those would have some side-effects (although, clearly not as detailed as you've described here 😄) When I started using this, I really wanted to ditch the I think clarifying which dictionary people should start out with (and which ones are good for plover vs other consumers) would be super valuable. This would make the additional dictionaries easier to introduce, A couple of general thoughts:
|
I just re-read what you wrote about having words in Plover's lookups... I'm not entirely sure that would be worth the effort... If a lookup is just going to tell me that I should fingerspell it, I almost feel like that should be part of plover (to display the strokes for any word as fingerspelling). |
You're not the first person to do this, which shows a growing use case for people choosing the Typey Type set over Plover's, which would suggest it's worth improving this experience. For a couple years it was only me using this set so I didn't need to think much about this. You might want to turn off some fingerspelling dictionaries (e.g. just use
Me too. I also wish Plover supported TOML dictionaries or possibly even YAML, which support comments. I'd like to have comments per entry. I've mostly resorted to commenting via commit messages, which is not terribly accessible to the non-developer part of the Plover community.
This is an interesting idea but I do think it will be too disruptive to have multiple folders. At least at this point in time. Partly because of scripts, partly because of navigating commit history (e.g. commenting entries by commit messages), partly existing links to dictionaries, and partly the overlap in dictionaries (I might want the "vim" dictionary for "coding" and for "dictation"). I'd also like Plover to support "sets of dictionaries" too. For example, switching between "coding dictionaries" and "dictation dictionaries" or whatever. @JRJurman, what if we added a "recommended dictionaries" section to the README under "Dictionaries"? Recommendations might link to the section of the README with the relevant blurb. |
@JRJurman, I'm curious how you found this repo and came to the decision to use these dictionaries? To help me understand more about how people find and use Plover so I can make Typey Type better and that sort of thing. |
I'm just on the plover discord and noticed you had posted it - although I'm forgetting when now... I had bookmarked it as something to switch over to eventually, and just got around to trying it out this weekend. As far as updating the README, I think that's perfect 👍 |
Looking now it appears that
Maybe I'm still confused on which dictionaries should be recommended by default, but the list is getting narrower 😮 |
I'm tempted to just use my script to just make a fingerspell-less version of the 10000 words, but I realize that isn't great for other people who want to be using this repo for dictionaries... |
Just chiming in: I would agree that at this point it might be worth updating the README, rather than create any extra dictionaries. I haven't actually encountered these kinds of problems yet as I am only using the out-of-the-box Plover dictionaries, and using them to inform issues/PRs I've been submitting (I'll probably switch over to using the dictionaries in this repo once I've hit 100% completion on Typey-Type). |
Thanks @JRJurman for the extra details and notes, and thanks @paulfioravanti for adding your voice to this. I've updated the README here: https://github.com/didoesdigital/steno-dictionaries/blob/master/README.md#how-to-use-these-dictionaries. Does this make it clear enough which dictionaries to use to avoid hitting these issues? |
@didoesdigital looks great! I will point out that the condensed-strokes (which you recommend for lookups) does include some of these same issues. I'm not entirely convinced that this is future-proofed in a way that others won't suffer the same issue, but that might be worth evaluating with a more specific use-case. For now, the suggested dictionaries are clear and make sense. |
Good catch, @JRJurman! Updated:
|
Oh, that won't help. I need to re-think that advice. Maybe just, "turn it on for lookups"? |
Is there a way to just have a dictionary for lookups? I was thinking last night if this was an option in plover, that would solve a lot of issues... If it is possible, than I think that would be good. It would also be nice (but certainly not required) if we could annotate in the README, next to the dictionary link something like *: dictionary contains fingerspellings, so it should be used as a lookup only |
I can also make a script to just tell us if a dictionary has a fingerspelling in it, so I can get you the full list of all dictionaries with fingerspellings |
Okay, did a quick reworking of my script, and now you can test for fingerspellings (or at least, something that looks like a fingerspelling). npm i -g steno-scripts
for dictionary in dictionaries/*.json; do test-fingerspellings "$dictionary"; done > fingerspellings_log.txt Here is the output https://gist.github.com/JRJurman/57fc4d57efbac5195f4f1c595633e13a It appears the following dictionaries have fingerspellings
some of these are single letter strokes, which I don't believe actually causes the described issue... I can double check, but if that's the case I can modify my script to not catch these. NOTE: it occurs to me that the fingerspelling dictionaries aren't caught here, that is because I'm checking the length of the string against the number of strokes. Since the fingerspelling dictionaries almost always have some curly braces, these get counted for the string length, and they therefore don't match. |
Neat! This is great, @JRJurman . I had a look through your other steno-scripts and there's a lot of handy stuff in there. Well done! I've updated the README again. I'm also making some progress on the Typey Type issue to look up briefs on the fly instead of using static files so it might reduce the need for certain dictionaries to have fingerspelling entries. I'll keep this issue open for now as we chip away at it from different angles. |
Summary
In the current
top-10000-project-gutenberg-words.json
there are several multi letter strokes that are saved as a single word:However this makes fingerspelling any words with these letters have a space in them.
For example, if I wanted to fingerspell the word "niceties", the word would appear as
I'm fairly new to steno, but I believe the expected output would be that fingerspelled words are not interrupted with a space, so it should just appear as
Again, I'm fairly new, so if I'm completely missing something feel free to correct me and close the issue 😄
Potential Solution
I'd hate to be misleading and remove some of the
10000
words in the gutenberg words list, but I think the solution might be to cut out any strokes that are just fingerspellings (if you're already fingerspelling each letter out, then I imagine most people are expecting to add a space anyways).I can make a query and a PR that replaces these in the dictionaries (specifically in the top-10000, or others as well). There might be some that need review (like
"Holt"
, which is capitalized), but we could go over those individually in the PR review.The text was updated successfully, but these errors were encountered: