Text or sentences? #13363
shashko-a
started this conversation in
Help: Best practices
Text or sentences?
#13363
Replies: 1 comment 1 reply
-
Hi! The NER model in spaCy will mostly look at local context. For annotators as well, it's usually sufficient to see the local context to do NE annotation - so I think the granularity of a single sentence will probably work best. Either way - if the sentences are independent and not coming from the same original document, I definitely wouldn't merge them into a single annotation/document, as this may actually be confusing ML models trained on such data. Hope that helps! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to train spaCy model with new data to find colors.
I collected a few hundreds of independent sentences and during annotation (I used https://tecoholic.github.io/ner-annotator/) I faced with the question: what's the best way to annotate data and to train spaCy with them?
Should I put all my sentences as one giant string and get one "entities" block in my json from annotator, or would it be better to separate each sentence and to get a json structure like "1 sentence - it's entities, 2 sentence - it's entities,..."?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions