Sentence spliting of sentences with out whitespace after period #38

oxinabox · 2019-10-11T10:37:48Z

julia>WordTokenizers.split_sentences(" This is a sentence.Laugh Out Loud. Keep coding. No. Yes! True! ohh!ya! me too. ")
7-element Array{SubString{String},1}:
" This is a sentence.Laugh Out Loud."
"Keep coding."
"No."
"Yes!"
"True!"
"ohh!ya!"
"me too."
I observed that the sentence which has no space after delimiter(Obviously that sentence grammatically incorrect) is not considered as two separate sentences(Like .Laugh Out Loud. and Ohh!ya!). Can this consider as an issue?

Originally posted by @RohitPingale in #32 (comment)

The text was updated successfully, but these errors were encountered:

RohitPingale · 2019-10-14T10:15:49Z

>>> from nltk.tokenize import sent_tokenize
>>> text = " This is a sentence.Laugh Out Loud. Keep coding. No. Yes! True! ohh!ya! me too. "
>>> sent_tokenize(text)
[' This is a sentence.Laugh Out Loud.', 'Keep coding.', 'No.', 'Yes!', 'True!', 'ohh!ya!', 'me too.']
I tried the same example in python it giving the same output, should we consider it as the benchmark or we have to split those sentences anyway?

oxinabox · 2019-10-14T10:27:42Z

@ninjin thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence spliting of sentences with out whitespace after period #38

Sentence spliting of sentences with out whitespace after period #38

oxinabox commented Oct 11, 2019

RohitPingale commented Oct 14, 2019 •

edited

Loading

oxinabox commented Oct 14, 2019

Sentence spliting of sentences with out whitespace after period #38

Sentence spliting of sentences with out whitespace after period #38

Comments

oxinabox commented Oct 11, 2019

RohitPingale commented Oct 14, 2019 • edited Loading

oxinabox commented Oct 14, 2019

RohitPingale commented Oct 14, 2019 •

edited

Loading