You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is currently easy to cause an out-of-memory condition when prompting a model with a very long prompt. This is an expected result of the implementation of both certain tokenizers as well as transformer attention. Experienced users may intentionally want to use long prompts, but less experienced users may encounter this issue by accident and encounter confusing OOM conditions (#31) or extremely slow runtime performance.
It may be helpful to explore a mechanism to limit default prompt length in order to help users avoid these friction points.
The text was updated successfully, but these errors were encountered:
It is currently easy to cause an out-of-memory condition when prompting a model with a very long prompt. This is an expected result of the implementation of both certain tokenizers as well as transformer attention. Experienced users may intentionally want to use long prompts, but less experienced users may encounter this issue by accident and encounter confusing OOM conditions (#31) or extremely slow runtime performance.
It may be helpful to explore a mechanism to limit default prompt length in order to help users avoid these friction points.
The text was updated successfully, but these errors were encountered: