Skip to content

inference optimization ⚗

Compare
Choose a tag to compare
@pszemraj pszemraj released this 08 Jul 01:10
· 2 commits to main since this release
d51c4cd

🦿 this release adds support for some features that can make inference faster:

  • support for torch compile & optimum onnx1
  • improved the textsum-dir command, more options/streamline etc, added fire package to help with that
    • the saved config JSON files are now better structured to keep track of parameters, etc
  • some small adjustments to the Summarizer class

Next up: the UI app will finally get an overhaul.

  1. please note that Support for is not an equivalent statement to "I have tested every longctx model with ONNX max quantization and sign off guaranteeing they will all provide accurate results". I've had some good results, but also some strange ones (with Long-T5 specifically). Test beforehand, and file an issue on the Optimum repo as needed 🙏