Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Once Upon A Time/Happily Ever After, a semantic search collage #29

Open
arnicas opened this issue Nov 30, 2024 · 0 comments
Open

Once Upon A Time/Happily Ever After, a semantic search collage #29

arnicas opened this issue Nov 30, 2024 · 0 comments

Comments

@arnicas
Copy link

arnicas commented Nov 30, 2024

NanoGenMo 2024 -- Once Upon a Time/Happily Ever After

A story constructed with sentences from Gutenberg Books using embedding similarity search.

Overview

  1. Take the top 100 books on Gutenberg Books, filter out the non-english and Victorian porn, and dupes.

  2. Then clean them of starter/end material, and clean up or remove sentences (too long, too short, roman numerals, etc.)

  3. Embed each sentence using a small embedding model.

  4. Put data into a vector database (Chroma).

  5. From a start and end sequences:
    "Once upon a time, in a land far away..."
    and "They lived happily ever after."

Search for nearest neighbors. Pick closest sentence, and repeat using each sentence as a new search.

  1. Generate half the text, for a wordcount of 25000 words each.

  2. Reverse the order of the ending sentences, so they end with the "Happily Ever After" part.

  3. Use interpolation to generate 10 sentences of merger text that starts with the last sentence of
    the first half, and ends with the first sentence of the second half (after reversing their order).

  4. Write them all to a markdown file, with the source title, author, distance score, and a code saying which section it is (start, end, interp).

Output is in output.md.

Samples

The start:

Once upon a time, in a faraway land...

        --narrator, nanogenmo, distance:0.000, code:input

There—for with your leave, my sister, I will put some trust in preceding navigators—there snow and frost are banished; and, sailing over a calm sea, we may be wafted to a land surpassing in wonders and in beauty every region hitherto discovered on the habitable globe.

        --Shelley, Mary Wollstonecraft, Frankenstein; Or, The Modern Prometheus, distance:0.502, code:start
        
It went with me on my sea-shore walks, and rambles into the country, whenever—which was seldom and reluctantly—I bestirred myself to seek that invigorating charm of Nature, which used to give me such freshness and activity of thought the moment that I stepped across the threshold of the Old Manse.

        --Hawthorne, Nathaniel, The Scarlet Letter, distance:0.503, code:start

At last, after rambling several days about the country, during which the fields afforded me the same bed and the same food which nature bestows on our savage brothers of the creation, I at length arrived at this place, where the solitude and wildness of the country invited me to fix my abode.

        --Fielding, Henry, History of Tom Jones, a Foundling, distance:0.510, code:start

Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

      --Melville, Herman, Moby Dick; Or, The Whale, distance:0.514, code:start

The end shows the weight of the happily ever after -- almost all Grimm Brothers after the first fortuitous match on Little Women.

There they found their child, now grown up to be comely and fair; and after all their troubles they lived happily together to the end of their days.

        --Grimm, Jacob, Grimm, Wilhelm, Grimms' Fairy Tales, distance:0.339, code:end

And then the prince and Briar Rose were married, and the wedding feast was given; and they lived happily together all their lives long.

        --Grimm, Jacob, Grimm, Wilhelm, Grimms' Fairy Tales, distance:0.335, code:end

They were very happy, even after they discovered that they couldn't live on love alone.

        --Alcott, Louisa May, Little Women, distance:0.313, code:end

And they lived happily ever after.
        --narrator, nanogenmo, distance:0.000, code:input

The interpolation part finally steers it away from a stuck-on-Henry Fielding bit (which solidly ends the "start" part due to Mr Allworthy):

For such was the compassion which inhabited Mr Allworthy's mind, that nothing but the steel of justice could ever subdue it.

        --Fielding, Henry, History of Tom Jones, a Foundling, distance:0.395, code:interp_3

Yet the force of truth did of itself flash into mine eyes, and I turned away my panting soul from incorporeal substance to lineaments, and colours, and bulky magnitudes.

        --Augustine, Saint, Bishop of Hippo, The Confessions of St. Augustine, distance:0.237, code:interp_4

All the truth of my position came flashing on me; and its disappointments, dangers, disgraces, consequences of all kinds, rushed in in such a multitude that I was borne down by them and had to struggle for every breath I drew.

        --Dickens, Charles, Great Expectations, distance:0.380, code:interp_5

Issues

I don't like the top 100 list :) I had no idea there would be so much non-fiction in there, but I suppose I should expect Nietzsche and the Bible. I would prefer to do it with top 100 fiction only, and maybe more writers I like.

After looking at the output, I think it needs some visualization elements (distance between, tallies of writers that showed up most) and some reasoning to handle proper nouns better (strip them out before embedding), but deadline. There is logic to prevent repeated sentence use, but not repeated book use, and there's a whole segment from one Fielding book near the middle. This is due to proper nouns biasing the search.

I missed a few cleaning rules, like initial numbers.

More structure would help the story, too but I wanted to do something simple.

Code files

Code will be cleaned up and doc'd in December, out of time :)

Repo at here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants