Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. #993

snlamm · 2024-09-11T08:40:30Z

Issue you'd like to raise.

Issue

It appears that running multimodal Claude 3.5 Sonnet prompts from the playground can consume 100,000s of extra tokens due to handling of base64 images.

Description

We have had many LLM calls to Claude 3.5 Sonnet run last week that, when run originally, consumed some smaller number of tokens (e.x. 10k).

However, when they were re-run in the Langsmith playground with no changes whatsoever, they consumed >150,000 - 400,000 tokens (I've double checked this directly in the Anthropic console, to confirm it's not simply a UI bug).

Upon further inspection, it appears that each image in the playground can consume many tens of thousands of tokens when rerun, rather than 800 tokens or so in the original call.

I think I've maybe managed to track down the issue, potentially, to Langsmith inadvertently inserting the base64 of the images in the prompt plaintext.

Here's an example of a run originally from September 4th that I just reran in the playground a few hours ago to reproduce this. You can see on the bottom right that even the single image in the prompt is consuming >20k tokens, as there's barely any other text (ignore the prompt/objective - I had to replace the original one for data sensitivity reasons). However, when it was originally run directly with Anthropic, it only consumed ~1k tokens or so:

.

What makes me think the issue is with the handling of the base64 is that when I save the prompt, it's saved with the image base64 passed in as plaintext 😱 :

My guess is that that's what's happening under the hood to create the inflated token usage - namely, the playground is dumping the base64 directly into the playground prompt text and sending it to the LLM.

Note that this happens in playground runs still today - I ran screenshotted actions above within the last few hours. However, it does not replicate for all runs (I tried that today with some new runs), and it's been tricky to figure out which ones are impacted.

Severity

I think this is a severe bug as it's already led to unexpected cost increases on our end, as we'd been using the playground for debugging and experimentation. I'm assuming it would impact other users similarly.

Please let me know if I can provide any other information! Langsmith is an awesome product ❤️

I wanted to provide public traces to help out, but I don't see a way of deleting older commits from my prompt forks (i.e. hiding the original prompts, which include sensitive information).

Suggestion:

No response

hinthornw · 2024-09-11T16:05:58Z

Thanks for flagging. Forwarded

snlamm · 2024-09-12T07:01:29Z

thanks @hinthornw @madams0013 - I've managed to replicate the issue in a shareable run a few minutes ago. Here is the public link. I can now replicate it consistently.

The original prompt call to Anthropic was ~2k tokens. In the playground, it's so many tokens that Anthropic rejects it.

I think the key is that the playground is misinterpreting curly braces - when there are curly braces in a formatted prompt, the playground is converting the image to base64 plaintext before sending it to the LLM provider. This is even though it's been escaped correctly when originally formatting the template, such as when handling stringified JSON/vector documents.

In other words, even though the template is formatted/escaped correctly in the service code, and handled perfectly by the langchain code and the LLM, the playground then receives the formatted prompt and erroneously assumes...something about the curly braces - I'm not too sure what's going on under the hood at that point 😄 .

Looking forward to this being fixed - it's relatively expensive and I think it may trip up users without them realizing, since we only noticed this bug when the provider started rejecting our playground calls due to token limits 😅 . Let me know if there's anything else helpful that I can share!

snlamm · 2024-09-12T07:53:44Z

Update: note that the problem replicates regardless of LLM provider. For example, here with OpenAI. In this case, because it's using an image url rather than base64, the token count is still small. However, the playground is still passing the data as plaintext (in this case the image url) rather than as multimodal, so the LLM is not receiving the actual image in the playground and cannot describe it.

snlamm · 2024-09-17T06:07:14Z

@madams0013 checking in - have you had a chance to look into this by any chance? We're having to hold off on playground testing for certain features and we'd love to be able to restart.

agola11 · 2024-09-18T01:12:03Z

Hi @snlamm we've had a big backlog of issues to address and plan to get to this in the next couple of weeks. Thanks for your patience

snlamm changed the title ~~Issue: Running LLM calls in the Langsmith playground consumes 100,000s of extra tokens if there are images.~~ Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. Sep 11, 2024

hinthornw assigned madams0013 Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. #993

Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. #993

snlamm commented Sep 11, 2024 •

edited

Loading

hinthornw commented Sep 11, 2024

snlamm commented Sep 12, 2024 •

edited

Loading

snlamm commented Sep 12, 2024

snlamm commented Sep 17, 2024

agola11 commented Sep 18, 2024

Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. #993

Issue: Prompts run in the Langsmith playground consume 100,000s of extra tokens if there are images. #993

Comments

snlamm commented Sep 11, 2024 • edited Loading

Issue you'd like to raise.

Issue

Description

Severity

Suggestion:

hinthornw commented Sep 11, 2024

snlamm commented Sep 12, 2024 • edited Loading

snlamm commented Sep 12, 2024

snlamm commented Sep 17, 2024

agola11 commented Sep 18, 2024

snlamm commented Sep 11, 2024 •

edited

Loading

snlamm commented Sep 12, 2024 •

edited

Loading