Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add vocabulary prediction #95

Open
max-hk opened this issue Sep 10, 2019 · 3 comments
Open

[Feature request] Add vocabulary prediction #95

max-hk opened this issue Sep 10, 2019 · 3 comments

Comments

@max-hk
Copy link

max-hk commented Sep 10, 2019

It would be better if ibus-cangjie could predict the next/next few words while users are typing.

There are many free Chinese vocabulary list in the Web, licensed in CC-BY-SA or BSD. You can find them in the link below.
https://chromium.googlesource.com/chromium/deps/icu46/+/e49b610806e6ba6063384ffd7f45d5b7cd561e65/source/data/brkitr/README.chromium

You can also use the pre-built by the chromium team, which combine all lists in the above link and licensed under a MIT-like LICENSE.
https://chromium.googlesource.com/chromium/deps/icu46/+/e49b610806e6ba6063384ffd7f45d5b7cd561e65/source/data/brkitr/cjdict.txt
...or a updated version of the combined list by Unicode
https://github.com/unicode-org/icu/blob/master/icu4c/source/data/brkitr/dictionaries/cjdict.txt

Android Pinyin IME repo also contains a vocabulary list (simplified Chinese only)
https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+/refs/heads/master/jni/data/rawdict_utf16_65105_freq.txt

@bochecha
Copy link
Member

Thanks for the issue.

However this is already tracked as #4 so this would have been better as a comment there.

However, since your comment provides more data, I'm going to close the other one and keep this one. 😉

@max-hk
Copy link
Author

max-hk commented Sep 10, 2019

@bochecha Thanks

@mbridon
Copy link
Contributor

mbridon commented Jul 28, 2023

Hi @max-hk, sorry for never giving any news. This is a very interesting feature we've always wanted !

However, due to unforeseen health issues I haven't been able to give this any thought for about 3 years... 😭

I'm trying to get back to this slowly though 😄

What would definitely help me however would be either the data from Chromium in source form so I can make use of it.

The license they use seems to be CC-BY-SA, is that correct? If it is, I think (but I'm not a lawyer and this is not legal advice) it should be compatible with using it in ibus-cangjie, but probably only if we get them from sources instead of the binary form (so we can make some modifications and share them back with Chromium of course, as allows and requires the CC-BY-SA).

So rest assured you helped a lot with finding this and we totally want to make good use of it 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants