Add support for clipboard processing from images #97

jaresty · 2024-07-15T21:27:42Z

This add support for image processing
requires the use of gpt-4o

- This add support for image processing - requires the use of gpt-4o

jaresty · 2024-07-16T00:24:29Z

I think this approach is workable if you are open to it. Because we are sending the requests much later than we are processing sources, I needed to make a signal that would allow me to know that I needed to pull from the clipboard later when sending the request to ChatGPT. This was the smallest change that I could find that seems to be workable-another alternative would be to build the request parts earlier where we are doing the source processing, but some of the other parts of the code rely on the text being raw so I wasn't sure that was a good idea for now.

jaresty · 2024-07-16T00:27:06Z

This change allows you to use almost any screenshot or clipboard contents and use them as input to every command. The possibilities are pretty open, and inspiring. You can use model blend to blend screenshots into text, or take a screenshot of a presentation and then use model snip to insert it into VSCode. The grammar is no different than it was before-the only change is that it check if the contents are an image to determine what to do.

jaresty · 2024-07-16T00:40:41Z

lib/modelHelpers.py

        ],
        "max_tokens": 2024,
        "temperature": settings.get("user.model_temperature"),
        "n": 1,
-        "stop": None,


This was causing an exception when running with an image prompt but not with text. I didn't notice any behavioral difference however so I removed it.

Good w me if it works with you

- This allows it to pull from images as well

jaresty · 2024-07-16T04:54:58Z

This works really well. I just tried "model snip format html clip" with a prompt to generate html and a screenshot of a design I needed to make and it generated a snippet based on the visual design dynamically. This is pretty amazing stuff 😮

jaresty · 2024-07-16T18:06:48Z

Incidentally appears that our current vision implementation is broken because the ChatGPT model is deprecated, however I didn't want to complicate this discussion by making a change there as well: tldraw/make-real-starter#30

jaresty · 2024-07-18T20:45:30Z

gpt-4o-mini launched today with vision and is cheaper than 3.5 turbo, apparently: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/

C-Loftus · 2024-07-19T00:16:21Z

GPT/gpt.py

@@ -244,7 +246,7 @@ def gpt_get_source_text(spoken_text: str) -> str:
        """Get the source text that is will have the prompt applied to it"""
        match spoken_text:
            case "clipboard":
-                return clip.text()
+                return "clip"


Can we accomplish this PR without needing to change this to return a string which then is matched in the other function? Is it possible for us to return the actual source text here?

If this is changed then gpt_get_source_text doesn't return the source text for the clipboard anymore so it doesn't semantically match what it is doing. (i.e. if we wanted to call it elsewhere)

It's difficult to do this without a pretty significant change. The nice thing about this is that it transparently falls back if it's not an image in the clipboard, but we have to change the structure of the message we send to ChatGPT. We need some way to indicate that that should happen. This was the smallest change that I could think of to make this happen.

Let's chat about this tomorrow-this syntax / strategy unlocks a lot of interesting possibilities that I'd love to share with you. A bit hard to explain in text.

I found a workaround that only makes the clipboard image a special case.

C-Loftus · 2024-07-19T00:18:39Z

Generally looks fine w me if we can find a way to clean up that one comment I made. I am fine switching to gpt-4o-mini. Seems like a good choice.

For some reason I have felt gpt 3.5 turbo is not as good as it once was. Maybe it is just me but it seems the newer versions of the model are extremely focused on trying to return conversation-style responses, even with proper prompting

- Make image the only clipboard special case

for more information, see https://pre-commit.ci

C-Loftus · 2024-07-19T01:15:36Z

Ok looks good to me. Thanks. This is once again pretty clean. Nice work!

To return the image as a string, I think we could b64 encode the image and pass it to the query function, but if you have other ideas don't want to break that. This is working well.

C-Loftus · 2024-07-19T01:16:55Z

Oh and I switched to gpt-4o-mini. Don't see any reason not to use that. Gives users more support and has better performance.

jaresty · 2024-07-19T01:19:55Z

I debated returning the base sixty four encoded images as well, but you have to add a different type and pass the image URL when sending the message to ChatGPT, so you'd have to do some kind of a match to determine if it was a base sixty four encoded image, which is less clean than doing a direct string comparison I think.

for more information, see https://pre-commit.ci

C-Loftus · 2024-07-19T01:33:28Z

Good to merge this now whenever you want. Changed the magic string to be __IMAGE__ so it is essentially not possible to trigger accidentally.

jaresty added 3 commits July 15, 2024 14:27

Add support for clipboard processing from images

c72b938

- This add support for image processing - requires the use of gpt-4o

Remove stray logging code

26b1773

Fixed support for passing clipboard

1320f7f

jaresty commented Jul 16, 2024

View reviewed changes

Update model bland clip to use new clip source

2d962da

- This allows it to pull from images as well

C-Loftus reviewed Jul 19, 2024

View reviewed changes

jaresty and others added 3 commits July 18, 2024 17:58

Respond to pull request feedback

3c54381

- Make image the only clipboard special case

Better error handling. Move to gpt-4o-mini as default

c355aa2

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7eac2d

for more information, see https://pre-commit.ci

C-Loftus and others added 3 commits July 18, 2024 21:32

make it hard to trigger accidentally. Image is a bit too simple

7aaca38

[pre-commit.ci] auto fixes from pre-commit.com hooks

cb65c32

for more information, see https://pre-commit.ci

Delete lib/a11yHelpers.py

b0a9712

jaresty merged commit 29832e8 into main Jul 19, 2024
3 checks passed

jaresty deleted the support-image-clipboard-processing branch July 19, 2024 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for clipboard processing from images #97

Add support for clipboard processing from images #97

jaresty commented Jul 15, 2024

jaresty commented Jul 16, 2024

jaresty commented Jul 16, 2024 •

edited

Loading

jaresty Jul 16, 2024

C-Loftus Jul 19, 2024

jaresty commented Jul 16, 2024

jaresty commented Jul 16, 2024

jaresty commented Jul 18, 2024

C-Loftus Jul 19, 2024

jaresty Jul 19, 2024

jaresty Jul 19, 2024

jaresty Jul 19, 2024

C-Loftus commented Jul 19, 2024

C-Loftus commented Jul 19, 2024 •

edited

Loading

C-Loftus commented Jul 19, 2024 •

edited

Loading

jaresty commented Jul 19, 2024

C-Loftus commented Jul 19, 2024 •

edited

Loading

Add support for clipboard processing from images #97

Add support for clipboard processing from images #97

Conversation

jaresty commented Jul 15, 2024

jaresty commented Jul 16, 2024

jaresty commented Jul 16, 2024 • edited Loading

jaresty Jul 16, 2024

Choose a reason for hiding this comment

C-Loftus Jul 19, 2024

Choose a reason for hiding this comment

jaresty commented Jul 16, 2024

jaresty commented Jul 16, 2024

jaresty commented Jul 18, 2024

C-Loftus Jul 19, 2024

Choose a reason for hiding this comment

jaresty Jul 19, 2024

Choose a reason for hiding this comment

jaresty Jul 19, 2024

Choose a reason for hiding this comment

jaresty Jul 19, 2024

Choose a reason for hiding this comment

C-Loftus commented Jul 19, 2024

C-Loftus commented Jul 19, 2024 • edited Loading

C-Loftus commented Jul 19, 2024 • edited Loading

jaresty commented Jul 19, 2024

C-Loftus commented Jul 19, 2024 • edited Loading

jaresty commented Jul 16, 2024 •

edited

Loading

C-Loftus commented Jul 19, 2024 •

edited

Loading

C-Loftus commented Jul 19, 2024 •

edited

Loading

C-Loftus commented Jul 19, 2024 •

edited

Loading