-
Notifications
You must be signed in to change notification settings - Fork 61
/ask
Priority - Token Optimization
#807
Comments
Token Limit is equivalent to GPT output, whatever you set to be the token limit is the maximum that GPT will respond with but that also has to include the input, bbut we determine the input so we can't set that really. The py package tiktoken is the best tokenization optimization package, there is a ts wrapper for it otherwise it'll be a case of using langchain and creating our own textSplitters and basing our input tokens on that which will be a rough but close estimate |
This issue is a non-starter really my friend as it was user error this time around but i'll still take the bounty lmao ;)) |
I'll wait until we get some real world use cases functional before we optimize |
a crude workaround could be that if the response from gpt is an error message stating the token count and how much we are over by we can make an educated guess as to how many chars to strip from the context in order to meet the token limit set? Another could be to use langchain to interact with openai and allow for this.llm = new OpenAI({
openAIApiKey: this.apiKey,
modelName: 'gpt-3.5-turbo-16k',
maxTokens: -1,
}) |
@Keyrxng time for compression/prioritization? Not a great first real world attempt lol.
Prioritization order:
We should use a tokenization estimator to know how much we should exclude.
Originally posted by @pavlovcik in #787 (comment)
It should also include a warning that it had to cut out some content. Perhaps even including the exact tokens used etc similar to the information that was presented in the error message above for context to the user to approximate how much was cut off.
The text was updated successfully, but these errors were encountered: