inference optimization ⚗

pszemraj released this 08 Jul 01:10

· 2 commits to main since this release

🦿 this release adds support for some features that can make inference faster:

support for torch compile & optimum onnx¹
improved the textsum-dir command, more options/streamline etc, added fire package to help with that
- the saved config JSON files are now better structured to keep track of parameters, etc
some small adjustments to the Summarizer class

Next up: the UI app will finally get an overhaul.

please note that Support for is not an equivalent statement to "I have tested every longctx model with ONNX max quantization and sign off guaranteeing they will all provide accurate results". I've had some good results, but also some strange ones (with Long-T5 specifically). Test beforehand, and file an issue on the Optimum repo as needed 🙏 ↩

Assets 2