-
Tried looking into the code for this but it was too complicated to unwrap in 10 minutes. I'm not sure I understand this part of the docs well, it's covered too briefly I think. Can you explain in more detail how this works? How can you ask the LLM to only generate the values of a JSON structure for example and have that be accurate? This library went beyond prompt engineering into prompt sorcery |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
GPT style LLMs are all auto-regressive and process tokens in two modes:
Guidance can "accelerate" inference because "prompt tokens" are way cheaper/faster than "generation tokens" (due to lots of factors like GPU batching). Because a guidance program specifies much of the structure of the output we can convert many of the output tokens into batches of prompt-like tokens that are cheaper. We also can use the structure of the template to dynamically bias the next token probabilities to make sure the text that comes next aligns with the template and is optimally tokenized (e.g. token healing). Note that we can only do this for models we have control over (currently local models in Transformers), though we are working on exposing a server version as well. |
Beta Was this translation helpful? Give feedback.
-
Are there any plans for LLMs models over API to support this feature? such as OpenAI? |
Beta Was this translation helpful? Give feedback.
GPT style LLMs are all auto-regressive and process tokens in two modes:
Guidance can "accelerate" inference because "prompt tokens" are way cheaper/faster than "generation tokens" (due to lots of factors like GPU batching). Because a guidance program specifies much of the structure of the output we can convert many of the output tokens into batches of prompt-like tokens that are cheaper. We also can use the struc…