-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross product? #40
Comments
The repo cross-product includes code and an example text of some 60,539 words, made by combining Moby Dick, The History of Tom Jones, a Foundling, and Middlemarch. An early passage reads,
The mechanism produces a text of the same number of sentences as the least of the inputs, and each sentence, of length equal to the shortest at that position among all the inputs, takes words alternately from each. |
(I imagine this or something like it has been done before.) |
The code takes local text files as inputs. It might be nice to retrieve texts from Project Gutenberg over the network, which would be a chance to get familiar with PG's machine-readable metadata. |
This change allows the use of Project Gutenberg text numbers as inputs, caching metadata and text files. The program is now somewhat more error-prone. There is no cache invalidation. I went down the wrong path at first, beating my head against XPath and lxml until I realized that the catalog file hadn't been updated since 2014. The current catalog, a CSV file, is much easier to deal with (though I'm not using it at the moment), but the head-beating was useful, as I still had to handle the individual works' RDF files. |
Almost any result is fun:
|
Another sample output, of about 65,526 words, was produced by squashing War and Peace, Crime and Punishment, and Anna Karenina:
from which
I've also added some input validation; I think I'll call this done. |
This is a late entry from the half-bakery, a rough idea about squashing multiple texts together. It is I think similar to but not the same as an idea @rebeccacremona has mentioned. I have in mind some pseudo-mathematical ideas, along with phrases like "cross product" and "convolution", though I doubt this will be any of those.
The text was updated successfully, but these errors were encountered: