Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. #993

Open
snlamm opened this issue Sep 11, 2024 · 5 comments
Assignees

Comments

@snlamm
Copy link
Contributor

snlamm commented Sep 11, 2024

Issue you'd like to raise.

Issue

It appears that running multimodal Claude 3.5 Sonnet prompts from the playground can consume 100,000s of extra tokens due to handling of base64 images.

Description

We have had many LLM calls to Claude 3.5 Sonnet run last week that, when run originally, consumed some smaller number of tokens (e.x. 10k).

However, when they were re-run in the Langsmith playground with no changes whatsoever, they consumed >150,000 - 400,000 tokens (I've double checked this directly in the Anthropic console, to confirm it's not simply a UI bug).

Upon further inspection, it appears that each image in the playground can consume many tens of thousands of tokens when rerun, rather than 800 tokens or so in the original call.

I think I've maybe managed to track down the issue, potentially, to Langsmith inadvertently inserting the base64 of the images in the prompt plaintext.

Here's an example of a run originally from September 4th that I just reran in the playground a few hours ago to reproduce this. You can see on the bottom right that even the single image in the prompt is consuming >20k tokens, as there's barely any other text (ignore the prompt/objective - I had to replace the original one for data sensitivity reasons). However, when it was originally run directly with Anthropic, it only consumed ~1k tokens or so:

Screenshot 2024-09-11 at 11 12 00 AM.

What makes me think the issue is with the handling of the base64 is that when I save the prompt, it's saved with the image base64 passed in as plaintext 😱 :

Screenshot 2024-09-11 at 11 14 06 AM

My guess is that that's what's happening under the hood to create the inflated token usage - namely, the playground is dumping the base64 directly into the playground prompt text and sending it to the LLM.

Note that this happens in playground runs still today - I ran screenshotted actions above within the last few hours. However, it does not replicate for all runs (I tried that today with some new runs), and it's been tricky to figure out which ones are impacted.

Severity

I think this is a severe bug as it's already led to unexpected cost increases on our end, as we'd been using the playground for debugging and experimentation. I'm assuming it would impact other users similarly.

Please let me know if I can provide any other information! Langsmith is an awesome product ❤️

I wanted to provide public traces to help out, but I don't see a way of deleting older commits from my prompt forks (i.e. hiding the original prompts, which include sensitive information).

Suggestion:

No response

@snlamm snlamm changed the title Issue: Running LLM calls in the Langsmith playground consumes 100,000s of extra tokens if there are images. Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. Sep 11, 2024
@hinthornw
Copy link
Collaborator

Thanks for flagging. Forwarded

@snlamm
Copy link
Contributor Author

snlamm commented Sep 12, 2024

thanks @hinthornw @madams0013 - I've managed to replicate the issue in a shareable run a few minutes ago. Here is the public link. I can now replicate it consistently.

The original prompt call to Anthropic was ~2k tokens. In the playground, it's so many tokens that Anthropic rejects it.

I think the key is that the playground is misinterpreting curly braces - when there are curly braces in a formatted prompt, the playground is converting the image to base64 plaintext before sending it to the LLM provider. This is even though it's been escaped correctly when originally formatting the template, such as when handling stringified JSON/vector documents.

In other words, even though the template is formatted/escaped correctly in the service code, and handled perfectly by the langchain code and the LLM, the playground then receives the formatted prompt and erroneously assumes...something about the curly braces - I'm not too sure what's going on under the hood at that point 😄 .

Looking forward to this being fixed - it's relatively expensive and I think it may trip up users without them realizing, since we only noticed this bug when the provider started rejecting our playground calls due to token limits 😅 . Let me know if there's anything else helpful that I can share!

@snlamm
Copy link
Contributor Author

snlamm commented Sep 12, 2024

Update: note that the problem replicates regardless of LLM provider. For example, here with OpenAI. In this case, because it's using an image url rather than base64, the token count is still small. However, the playground is still passing the data as plaintext (in this case the image url) rather than as multimodal, so the LLM is not receiving the actual image in the playground and cannot describe it.

@snlamm
Copy link
Contributor Author

snlamm commented Sep 17, 2024

@madams0013 checking in - have you had a chance to look into this by any chance? We're having to hold off on playground testing for certain features and we'd love to be able to restart.

@agola11
Copy link
Contributor

agola11 commented Sep 18, 2024

Hi @snlamm we've had a big backlog of issues to address and plan to get to this in the next couple of weeks. Thanks for your patience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants