-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POS tagging #34
Comments
Both embeddings are trained using the Word2Vec model from gensim. Here is the detail of the corpus. |
Thanks!
… On March 21, 2021 at 10:01 PM Mu Yang ***@***.***> wrote:
Both embeddings are trained using the Word2Vec model from gensim.
Here is the detail of the corpus https://github.com/ckiplab/ckiptagger/wiki/Corpora .
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub #34 (comment) , or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6IED2TAOUPMUCJQ5CKPQTTE2QGJANCNFSM4ZQGLF4Q .
|
On this page, I followed POS tagging link ./data/model_ner/pos_list.txt -> 詞性列表,請見 Wiki / Technical Report no. 93-05 from https://github.com/ckiplab/ckiptagger/wiki/Chinese-README It mentioned there is a electronic dictionary that include each vocabulary's type (詞性). How get I get access? Thanks. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've tried the following example as input:
這些(Neqa) 語辭(Na) 都(D) 含有(VJ) 高(VH) 調音(VA)
With customized dictionary, it was able to tag 高調音 as Na.
word_to_weight = {
"高調音": 1,
"土地公": 1,
"土地婆": 1,
"公有": 2,
"": 1,
"來亂的": "啦",
"緯來體育台": 1,
}
word_sentence_list = ws(sentence_list, recommend_dictionary=dictionary)
Is there any code or paper describe how data (token_list.npy, vector_list.np, model_pos, etc) were trained/created?
Thanks.
The text was updated successfully, but these errors were encountered: