-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pride, Prejudice by @hugovk #130
Comments
Ha, this is great! 60% reduced Pride and Prejudice is still totally readable. Too bad the summarizer took out all the damns. |
"Remove honorifics (Mr., Mrs., Miss, Dr.)" 😱 How can I then tell the "Bennet"s apart?! |
@janelleshane Cliff-notes are also readable. |
Great |
@henrikh @danesparza Yep, I did realise that but unfortunately they just had to go to reduce the word count :) I should have replaced "Mrs. Bennet" with her maiden name, "Gardiner"! |
Sometimes you will see major characters referred to with a shortened version of the name after introduction. I would suggest calling Mrs. Bennet Mrs. B, Mr. Bennet Mr. B. You don't remove honorifics and reduce word count but you reduce character count. |
Actually considering the patriarchy Mr. B can just be B. on edit: Ms can be used in place of Mrs. in modern times of course. |
@bryanrasmussen Word count is all that matters :) |
Not if your last name is Hugo, and your first Victor!
…On Tue, Dec 5, 2017 at 9:22 AM, Hugo ***@***.***> wrote:
@bryanrasmussen <https://github.com/bryanrasmussen> Word count is all
that matters :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATEQMWinK_SBHnu2ojlLhTpdjfhqcV6ks5s9P1AgaJpZM4Qwz2t>
.
|
:) See https://news.ycombinator.com/item?id=15823499 for more discussion. |
@henrikh you'd have to make do with context, I suppose, but that's not all that different than the base text because only the eldest daughter is addressed by only her surname ("Miss Bennet") whereas the younger daughters are addressed with either their first or full names ("Miss Elizabeth" / "Miss Elizabeth Bennet"). I haven't read Pride and Prejudice in a while, are there any examples where the reader must discern identity (among Bennets or any other family) from context? |
@philsnow As far as I recall, Elizabeth is actually referred to as "Miss Bennet" when adressed directly by Mr Darcy and Mr Wickham -- but, of course, in those situations there would be no doubt 😉 |
Pride, Prejudice
Generated output
What it does
The problem isn't generating over 50,000 words. The problem is existing books are too long. Pride and Prejudice is 130,000 words, Moby Dick is 215,136 words (or 215,136 meows). And we all know 50,000 is the gold standard for a novel! So how can we reduce the word count?
These tactics reduce Pride and Prejudice by about 15% to 111,000 words.
Next we work out the ratio of words we have to 50k, count how many sentences we have, and work out how many sentences we want to approach 50k and use a text summariser to chop out the dead wood.
How to do it
Run:
Example:
This produces output.txt before the summariser, and output2.txt after the summariser.
Works at least with macOS High Sierra with Python 3.6.3.
Example
Here's a diff of Pride and Prejudice and the first pass output.txt:
Source code
https://github.com/hugovk/NaNoGenMo-2017/tree/master/03-reducifier
The text was updated successfully, but these errors were encountered: