You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
julia>WordTokenizers.split_sentences(" This is a sentence.Laugh Out Loud. Keep coding. No. Yes! True! ohh!ya! me too. ") 7-element Array{SubString{String},1}: " This is a sentence.Laugh Out Loud." "Keep coding." "No." "Yes!" "True!" "ohh!ya!" "me too."
I observed that the sentence which has no space after delimiter(Obviously that sentence grammatically incorrect) is not considered as two separate sentences(Like .Laugh Out Loud. and Ohh!ya!). Can this consider as an issue?
>>> from nltk.tokenize import sent_tokenize >>> text = " This is a sentence.Laugh Out Loud. Keep coding. No. Yes! True! ohh!ya! me too. " >>> sent_tokenize(text) [' This is a sentence.Laugh Out Loud.', 'Keep coding.', 'No.', 'Yes!', 'True!', 'ohh!ya!', 'me too.']
I tried the same example in python it giving the same output, should we consider it as the benchmark or we have to split those sentences anyway?
julia>WordTokenizers.split_sentences(" This is a sentence.Laugh Out Loud. Keep coding. No. Yes! True! ohh!ya! me too. ")
7-element Array{SubString{String},1}:
" This is a sentence.Laugh Out Loud."
"Keep coding."
"No."
"Yes!"
"True!"
"ohh!ya!"
"me too."
I observed that the sentence which has no space after delimiter(Obviously that sentence grammatically incorrect) is not considered as two separate sentences(Like .Laugh Out Loud. and Ohh!ya!). Can this consider as an issue?
Originally posted by @RohitPingale in #32 (comment)
The text was updated successfully, but these errors were encountered: