Changes
- Skip evaluating tokens that are evaluated in the past. This can significantly speed up prompt processing in chat applications that prepend previous messages to prompt.
- Deprecate
LLM.reset()
method. Use high-level API instead. - Add support for batching and beam search to 🤗 model.
- Remove universal binary option when building for AVX2, AVX on macOS.