Ontonotes5.0 Chinese NER dataset

Notes:

The data source is from OntoNotes Release 5.0 - LDC2013T19
The corpus contains 18 named entity types (including 7 value types)
The split of train/dev/test portions follows http://conll.cemantix.org/2012/download/ids/

Statistics

#doc	#sent	#word
1911	48K	988K

Genre	#train	#dev	#test
BC	7862	2239	885
BN	8149	949	985
MZ	3988	362	451
NW	3569	425	516
TC	7510	1129	643
WB	6479	1113	813
#sum	37557	6217	4293

Description

GENRE = {bc bn mz nw tc wb}
SPLIT = {train dev test}

{GENRE}.{SPLIT}.id: document id collections
{GENRE}.{SPLIT}.char: char-level annotated data collections
{GENRE}.{SPLIT}.txt: word-level annotated data collections
{GENRE}.{SPLIT}.raw.txt: raw sentence-level data collections

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
all_data_id		all_data_id
ner_data_id		ner_data_id
processed_data		processed_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ontonotes5.0 Chinese NER dataset

Statistics

Description

About

Releases

Packages

License

LindgeW/Ontonotes5.0-Chinese-NER

Folders and files

Latest commit

History

Repository files navigation

Ontonotes5.0 Chinese NER dataset

Statistics

Description

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages