Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use it for character generation? #8

Open
aletote opened this issue Mar 9, 2019 · 1 comment
Open

Can I use it for character generation? #8

aletote opened this issue Mar 9, 2019 · 1 comment

Comments

@aletote
Copy link

aletote commented Mar 9, 2019

I guess all I need is to put spaces between the characters on the text file dataset?

@26medias
Copy link
Owner

26medias commented Mar 9, 2019

You would only need to update the way the text file is tokenized in the tokenize() method on line 27:
https://github.com/26medias/context-aware-markov-chains/blob/master/cmarkov.js#L27

cues are the sentence splitters.
tokens is the word plitter.

If you change var tokens = text.split(' '); to var tokens = text.split('');, you would split the text into chars.

However, it probably won't output anything of value, if anything.

The algorithm works by mapping the structure of the sentences: Positions of the verbs, adjectives, subjects, ... This is how it learns and reproduces the general style of the training text.
If you split in chars instead of words, the POS (Part of Speech) tagging won't work, it won't be able to learn any style, and therefor it probably won't be able to output much.

The text generation is based on statistics rather than machine learning. During training a graph is made that maps the relationship between words, which is then used to generate the text. The output only makes sense because the POS is able to re-build a generally properly structured sentence, but without the POS, the output will probably be nonsense.

I would suggest looking at an LSTM instead, it will output much better results.
https://github.com/tensorflow/tfjs-examples/tree/master/lstm-text-generation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants