- fix: Add AutocastType to public API
- Version of external components used during testing:
- PyTorch 2.6.0a0+df5bbc0
- TensorFlow 2.16.1
- TensorRT 10.6.0.26
- Torch-TensorRT 2.6.0a0
- ONNX Runtime 1.19.2
- Polygraphy: 0.49.13
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Introducing custom_args in TensorConfig for custom runners to use which allows dynamic shapes setup for TorchTensorRT compilation
- new: autocast_dtype added Torch runner configuration to set the dtype for autocast
- new: New version of Onnx Runtime 1.20 for python version >= 3.10
- new: Use
torch.compile
path in heuristic search for max batch size - change: Removed TensorFlow dependencies for
nav.jax.optimize
- change: Removed PyTorch dependencies from
nav.profile
- change: Collect all Python packages in status instead of filtered list
- change: Use default throughput cutoff threshold for max batch size heuristic when
None
provided in configuration - change: Updated default ONNX opset to 20 for Torch >= 2.5
- fix: Exception is raised with Python >=3.11 due to wrong dataclass initialization
- fix: Removed option from ExportOption removed from Torch 2.5
- fix: Improved preprocessing stage in Torch based runners
- fix: Warn when using autocast with bfloat16 in Torch
- fix: Pass runner configuration to runners in nav.profile
- Version of external components used during testing:
- PyTorch 2.6.0a0+df5bbc0
- TensorFlow 2.16.1
- TensorRT 10.6.0.26
- Torch-TensorRT 2.6.0a0
- ONNX Runtime 1.19.2
- Polygraphy: 0.49.13
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: simple and detailed reporting of the optimization process
- new: adjusted exporting TensorFlow SavedModel for Keras 3.x
- new: inform user when wrapped a module which is not called during optimize
- new: inform user when module uses a custom forward function
- new: support for dynamic shapes in Torch ExportedProgram
- new: use ExportedProgram for Torch-TensorRT conversion
- new: support back-off policy during profiling to avoid reporting local minimum
- new: automatically scale conversion batch size when modules have different batch sizes in scope of a single pipeline
- change: TensorRT conversion max batch size search rely on saturating throughput for base formats
- change: adjusted profiling configuration for throughput cutoff search
- change: include optimized pipeline to list of examined variants during
nav.profile
- change: performance is not executed when correctness failed for format and runtime
- change: verify command is not executed when verify function is not provided
- change: do not create a model copy before executing
torch.compile
- fix: pipelines sometimes obtain model and tensors on different devices during
nav.profile
- fix: extract graph from ExportedProgram for running inference
- fix: runner configuration not propagated to pre-processing steps
- Version of external components used during testing:
- PyTorch 2.4.0a0+3bcc3cddb5
- TensorFlow 2.16.1
- TensorRT 10.3.0.26
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.12
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Python 3.12 support
- new: Improved logging
- new: optimized in-place module can be stored to Triton model repository
- new: multi-profile support for TensorRT model build and runtime
- new: measure duration of each command executed in optimization pipeline
- new: TensorRT-LLM model store generation for deployment on Triton Inference Server
- change: filter unsupported runners instead of raising an error when running optimize
- change: moved JAX to support to experimental module and limited support
- change: use autocast=True for Torch based runners
- change: use torch.inference_mode or torch.no_grad context in
nav.profile
measurements - change: use multiple strategies to select optimized runtime, defaults
to [
MaxThroughputAndMinLatencyStrategy
,MinLatencyStrategy
] - change:
trt_profiles
are not set automatically for module when usingnav.optimize
- fix: properly revert log level after torch onnx dynamo export
- Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Check if torch 2 is available before doing dynamo cleanup
- Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: inplace
nav.Module
acceptsbatching
flag which overrides a config setting andprecision
which allows setting appropriate configuration for TensorRT - new: Allow to set device when loading optimized modules using
nav.load_optimized()
- new: Add support for custom i/o names and dynamic shapes in Torch ONNX Dynamo path
- new: Added
nav.bundle.save
andnav.bundle.load
to save and load optimized models from cache - change: Improved optimize and profile status in inplace mode
- change: Improved handling defaults for ONNX Dynamo when executing
nav.package.optimize
- fix: Maintaining modules device in
nav.profile()
- fix: Add support for all precisions for TensorRT in
nav.profile()
- fix: Forward method not passed to other inplace modules.
- Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.0
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: TensorRT Timing Tactics Cache Management - using timing tactics cache files for optimization performance improvements
- new: Added throughput saturation verification in
nav.profile()
(enabled by default) - new: Allow to override Inplace cache dir through
MODEL_NAVIGATOR_DEFAULT_CACHE_DIR
env variable - new: inplace
nav.Module
can now receive a function name to be used instead of call in modules/submodules, allows customizing modules with non-standard calls - fix: torch dynamo export and torch dynamo onnx export
- fix: measurement stabilization in
nav.profile()
- fix: inplace inference through Torch
- fix: trt_profiles argument handling in ONNX to TRT conversion
- fix: optimal shape configuration for batch size in Inplace API
- change: Disable TensorRT profile builder
- change:
nav.optimize()
does not override module configuration
- Version of external components used during testing:
- PyTorch 2.3.0a0+6ddf5cf85e
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Inference with TensorRT when model has input with empty shape
- fix: Using stabilized runners when model has no batching
- fix: Invalid dependencies for cuDNN - review known issues
- fix: Make ONNX Graph Surgeon produce artifacts within protobuf Limit (2G)
- change: Remove TensorRTCUDAGraph from default runners
- change: updated ONNX package to 1.16
- Version of external components used during testing:
- PyTorch 2.3.0a0+40ec155e58
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Allow to select device for TensorRT runner
- new: Add device output buffers to TensorRT runner
- new: nav.profile added for profiling any Python function
- change: API for Inplace optimization (breaking change)
- fix: Passing inputs for Torch to ONNX export
- fix: Parse args to kwargs in torchscript-trace export
- fix: Lower peak memory usage when loading Torch inplace optimized model
- Version of external components used during testing:
- PyTorch 2.3.0a0+ebedce2
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- change: Add input and output specs for Triton model repositories generated from packages
- Version of external components used during testing:
- PyTorch 2.2.0a0+81ea7a48
- TensorFlow 2.14.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Passing inputs for Torch to ONNX export
- fix: Passing input data to OnnxCUDA runner
- Version of external components used during testing:
- PyTorch 2.2.0a0+81ea7a48
- TensorFlow 2.14.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: FP8 precision support for TensorRT
- new: Support for autocast and inference mode configuration for Torch runners
- new: Allow to select device for Torch and ONNX runners
- new: Add support for
default_model_filename
in Triton model configuration - new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute)
- fix: JAX export and TensorRT conversion fails when custom workspace is used
- fix: Missing max workspace size passed to TensorRT conversion
- fix: Execution of TensorRT optimize raise error during handling output metadata
- fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package
- Version of external components used during testing:
- PyTorch 2.2.0a0+6a974be
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: decoupled mode configuration in Triton Model Config
- new: support for PyTorch ExportedProgram and ONNX dynamo export
- new: added GraphSurgeon ONNX optimization
- fix: compatibility of generating PyTriton model config through adapter
- fix: installation of packages that are platform dependent
- fix: update package config with model loaded from source
- change: in TensorRT runner, when TensorType.TORCH is the return type lazily convert tensor to Torch
- change: move from Polygraphy CLI to Polygraphy Python API
- change: removed Windows from support list
- Version of external components used during testing:
- PyTorch 2.1.0a0+32f93b1
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
-
new: Data dependent dynamic control flow support in nav.Module (multiple computation graphs per module)
-
new: Added find max batch size utility
-
new: Added utilities API documentation
-
new: Add Timer class for measuring execution time of models and Inplace modules.
-
fix: Use wide range of shapes for TensorRT conversion
-
fix: Sorting of samples loaded from workspace
-
change: in Inplace, store one sample by default per module and store shape info for all samples
-
change: always execute export for all supported formats
-
Known issues and limitations:
- nav.Module moves original torch.nn.Module to the CPU, in case of weight sharing that might result in unexpected behaviour
- For data dependent dynamic control flow (multiple computation graphs) nav.Module might copy the weights for each separate graph
- Version of external components used during testing:
- PyTorch 2.1.0a0+29c30b1
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Obtaining inputs names from ONNX file for TensorRT conversion
- change: Raise exception instead of exit with code when required command has failed
- Version of external components used during testing:
- PyTorch 2.1.0a0+b5021ba
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: gather onnx input names based on model's forward signature
- fix: do not run TensorRT max batch size search when max batch size is None
- fix: use pytree metadata to flatten torch complex outputs
- Version of external components used during testing:
- PyTorch 2.1.0a0+b5021ba
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Inplace Optimize feature - optimize models directly in the Python code
- new: Non-tensor inputs and outputs support
- new: Model warmup support in Triton model configuration
- new: nav.tensorrt.optimize api added for testing and measuring performance of TensorRT models
- new: Extended custom configs to pass arguments directly to export and conversion operations like
torch.onnx.export
orpolygraphy convert
- new: Collect GPU clock during model profiling
- new: Add option to configure minimal trials and stabilization windows for performance verification and profiling
- change: Navigator package version change to 0.2.3. Custom configurations now use trt_profiles list instead single value
- change: Store separate reproduction scripts for runners used during correctness and profiling
- Version of external components used during testing:
- PyTorch 2.1.0a0+b5021ba
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Conditional imports of supported frameworks in export commands
- Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Collect information about TensorRT shapes used during conversion
- fix: Invalid link in documentation
- change: Improved rendering documentation
- Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Add model from package to Triton model store with custom configs
- Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Zero-copy runners for Torch, ONNX and TensorRT - omit H2D and D2H memory copy between runners execution
- new:
nav.pacakge.profile
API method to profile generated models on provided dataloader - change: ProfilerConfig replaced with OptimizationProfile:
- new: OptimizationProfile impact the conversion for TensorRT
- new:
batch_sizes
andmax_batch_size
limit the max profile in TensorRT conversion - new: Allow to provide separate dataloader for profiling - first sample used only
- new: allow to run
nav.package.optimize
on empty package - status generation only - new: use
torch.inference_mode
for inference runner when PyTorch 2.x is available - fix: Missing
model
in config when passing package generated duringnav.{framework}.optimize
directly tonav.package.optimize
command - Other minor fixes and improvements
- Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Load samples as sorted to keep valid order
- fix: Execute conversion when model already exists in path
- Other minor fixes and improvements
- Version of external components used during testing:
- PyTorch 2.1.0a0+fe05266f
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Public
nav.utilities
module with UnpackedDataloader wrapper - new: Added support for strict flag in Torch custom config
- new: Extended TensorRT custom config to support builder optimization level and hardware compatibility flags
- fix: Invalid optimal shape calculation for odd values in max batch size
- Version of external components used during testing:
- PyTorch 2.1.0a0+fe05266f
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Custom implementation for ONNX and TensorRT runners
- new: Use CUDA 12 for JAX in unit tests and functional tests
- new: Step-by-step examples
- new: Updated documentation
- new: TensorRTCUDAGraph runner introduced with support for CUDA graphs
- fix: Optimal shape not set correctly during adaptive conversion
- fix: Find max batch size command for JAX
- fix: Save stdout to logfiles in debug mode
- Version of external components used during testing:
- PyTorch 2.1.0a0+fe05266f
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: filter outputs using output_metadata in ONNX runners
- Version of external components used during testing:
- PyTorch 2.0.0a0+1767026
- TensorFlow 2.11.0
- TensorRT 8.5.3.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Added Contributor License Agreement (CLA)
- fix: Added missing --extra-index-url to installation instruction for pypi
- fix: Updated wheel readme
- fix: Do not run TorchScript export when only ONNX in target formats and ONNX extended export is disabled
- fix: Log full traceback for ModelNavigatorUserInputError
- Version of external components used during testing:
- PyTorch 2.0.0a0+1767026
- TensorFlow 2.11.0
- TensorRT 8.5.3.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: Using relative workspace cause error during Onnx to TensorRT conversion
- fix: Added external weight in package for ONNX format
- fix: bugfixes for functional tests
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new: Support for PyTriton deployment
- new: Support for Python models with python.optimize API
- new: PyTorch 2 compile CPU and CUDA runners
- new: Collect conversion max batch size in status
- new: PyTorch runners with
compile
support - change: Improved handling CUDA and CPU runners
- change: Reduced finding device max batch size time by running it once as separate pipeline
- change: Stored find max batch size result in separate filed in status
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: when exporting single input model to saved model, unwrap one element list with inputs
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: in Keras inference use model.predict(tensor) for single input models
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: loading configuration for trt_profile from package
- fix: missing reproduction scripts and logs inside package
- fix: invalid model path in reproduction script for ONNX to TRT conversion
- fix: collecting metadata from ONNX model in main thread during ONNX to TRT conversion
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- fix: when specified use dynamic axes from custom OnnxConfig
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.2.2
- ONNX Runtime 1.13.1
- Polygraphy: 0.43.1
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- new:
optimize
method that replaceexport
and perform max batch size search and improved profiling during process - new: Introduced custom configs in
optimize
for better parametrization of export/conversion commands - new: Support for adding user runners for model correctness and profiling
- new: Search for max possible batch size per format during conversion and profiling
- new: API for creating Triton model store from Navigator Package and user provided models
- change: Improved status structure for Navigator Package
- deprecated: Optimize for Triton Inference Server support
- deprecated: HuggingFace contrib module
- Bug fixes and other improvements
- Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.2.2
- ONNX Runtime 1.13.1
- Polygraphy: 0.43.1
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.11
- Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.10
- Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.09
- Model Navigator Export API:
- new: cast int64 input data to int32 in runner for Torch-TensorRT
- new: cast 64-bit data samples to 32-bit values for TensorRT
- new: verbose flag for logging export and conversion commands to console
- new: debug flag to enable debug mode for export and conversion commands
- change: logs from commands are streamed to console during command run
- change: package load omit the log files and autogenerated scripts
- Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.08
- Model Navigator Export API:
- new: TRTExec runner use
use_cuda_graph=True
by default - new: log warning instead of raising error when dataloader dump inputs with
nan
orinf
values - new: enabled logging for command input parameters
- fix: invalid use of Polygraphy TRT profile when trt_dynamic_axes is passed to export function
- new: TRTExec runner use
- Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.19.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.07
- Model Navigator OTIS:
- deprecated:
TF32
precision for TensorRT from CLI options - will be removed in future versions - fix: Tensorflow module was imported when obtaining model signature during conversion
- deprecated:
- Model Navigator Export API:
- new: Support for building framework containers with Model Navigator installed
- new: Example for loading Navigator Package for reproducing the results
- new: Create reproducing script for correctness and performance steps
- new: TrtexecRunner for correctness and performance tests with trtexec tool
- new: Use TF32 support by default for models with FP32 precision
- new: Reset conversion parameters to defaults when using
load
for package - new: Testing all options for JAX export enable_xla and jit_compile parameters
- change: Profiling stability improvements
- change: Rename of
onnx_runtimes
export function parameters toruntimes
- deprecated:
TF32
precision for TensorRT from available options - will be removed in future versions - fix: Do not save TF-TRT models to the .nav package
- fix: Do not save TF-TRT models from the .nav package
- fix: Correctly load .nav packages when
_input_names
or_output_names
specified - fix: Adjust TF and TF-TRT model signatures to match
input_names
- fix: Save ONNX opset for CLI configuration inside package
- fix: Reproduction scripts were missing for failing paths
- Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Model Navigator Export API:
- new: Improved handling inputs and outputs metadata
- new: Navigator Package version updated to 0.1.3
- new: Backward compatibility with previous versions of Navigator Package
- fix: Dynamic shapes for output shapes were read incorrectly
- Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.06
- Model Navigator OTIS:
- new: Perf Analyzer profiling data use base64 format for content
- fix: Signature for TensorRT model when has
uint64
orint64
input and/or outputs defined
- Model Navigator Export API:
- new: Updated navigator package format to 0.1.1
- new: Added Model Navigator version to status file
- new: Add atol and rtol configuration to CLI config for model
- new: Added experimental support for JAX models
- new: In case of export or conversion failures prepare minimal scripts to reproduce errors
- fix: Conversion parameters are not stored in Navigator Package for CLI execution
- Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.05
- Model Navigator OTIS:
- fix: Saving paths inside the Triton package status file
- fix: Empty list of gpus cause the process run on CPU only
- fix: Reading content from zipped Navigator Package
- fix: When no GPU or target device set to CPU
optimize
avoid running unsupported conversions in CLI - new: Converter accept passing target device kind to selected CPU or GPU supported conversions
- new: Added support for OpenVINO accelerator for ONNXRuntime
- new: Added option
--config-search-early-exit-enable
for Model Analyzer early exit support in manual profiling mode - new: Added option
--model-config-name
to theselect
command. It allows to pick a particular model configuration for deployment from the set of all configurations generated by Triton Model Analyzer, even if it's not the best performing one. - removed: The
--tensorrt-strict-types
option has been removed due to deprecation of the functionality in upstream libraries.
- Model Navigator Export API:
- new: Added dynamic shapes support and trt dynamic shapes support for TensorFlow2 export
- new: Improved per format logging
- new: PyTorch to Torch-TRT precision selection added
- new: Advanced profiling (measurement windows, configurable batch sizes)
- Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.16.0
- tf2onnx: v1.10.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.04
- Model Navigator Export API
- Support for exporting models from TensorFlow2 and PyTorch source code to supported target formats
- Support for conversion from ONNX to supported target formats
- Support for exporting HuggingFace models
- Conversion, Correctness and performance tests for exported models
- Definition of package structure for storing all exported models and additional metadata
- Model Navigator OTIS:
- change:
run
command has been deprecated and may be removed in a future release - new:
optimize
command replacerun
and produces an output*.triton.nav
package - new:
select
selects the best-performing configuration from*.triton.nav
package and create a Triton Inference Server model repository - new: Added support for using shared memory option for Perf Analyzer
- change:
- Remove wkhtmltopdf package dependency
- Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.14.0
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.02
- Removed support for Python 3.7
- Triton Model configuration related:
- Support dynamic batching without setting preferred batch size value
- Profiling related:
- Deprecated
--config-search-max-preferred-batch-size
flag as is no longer supported in Triton Model Analyzer
- Deprecated
- Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Updated NVIDIA containers defaults to 22.01
- Removed support for Python 3.6 due to EOL
- Conversion related:
- Added support for Torch-TensorRT conversion
- Fixes and improvements
- Processes inside containers started by Model Navigator now run without root privileges
- Fix for volume mounts while running Triton Inference Server in container from other container
- Fix for conversion of models without file extension on input and output paths
- Fix using
--model-format
argument when input and output files have no extension
- Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info
- Updated NVIDIA containers defaults to 21.12
- Conversion related:
- [Experimental] TF-TRT - fixed default dataset profile generation
- Configuration Model on Triton related
- Fixed name for onnxruntime backend in Triton model deployment configuration
- Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info
- Updated NVIDIA containers defaults to 21.10
- Fixed generating profiling data when
dtypes
are not passed - Conversion related:
- [Experimental] Added support for TF-TRT conversion
- Configuration Model on Triton related
- Added possibility to select batching mode - default, dynamic and disabled options supported
- Install dependencies from pip packages instead of wheels for Polygraphy and Triton Model Analyzer
- fixes and improvements
- Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info
- Updated NVIDIA containers defaults to 21.09
- Improved naming of arguments specific for TensorRT conversion and acceleration with backward compatibility
- Use pip package for Triton Model Analyzer installation with minimal version 1.8.0
- Fixed
model_repository
path to be not relative to<navigator_workspace>
dir - Handle exit codes correctly from CLI commands
- Support for use device ids for
--gpus
argument - Conversion related
- Added support for precision modes to support multiple precisions during conversion to TensorRT
- Added
--tensorrt-sparse-weights
flag for sparse weight optimization for TensorRT - Added
--tensorrt-strict-types
flag forcing it to choose tactics based on the layer precision for TensorRT - Added
--tensorrt-explicit-precision
flag enabling explicit precision mode - Fixed nan values appearing in relative tolerance during conversion to TensorRT
- Configuration Model on Triton related
- Removed default value for
engine_count_per_device
- Added possibility to define Triton Custom Backend parameters with
triton_backend_parameters
command - Added possibility to define max workspace size for TensorRT backend accelerator using
argument
tensorrt_max_workspace_size
- Removed default value for
- Profiling related
- Added
config_search
prefix to all profiling parameters (BREAKING CHANGE) - Added
config_search_max_preferred_batch_size
parameter - Added
config_search_backend_parameters
parameter
- Added
- fixes and improvements
- Versions of used external components:
- Polygraphy: 0.32.0
- GraphSurgeon: 0.3.13
- tf2onnx: v1.9.2 (support for ONNX opset 14, tf 1.15 and 2.6)
- Triton Model Analyzer 1.8.2
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- Updated NVIDIA containers defaults to 21.08
- Versions of used external components:
- Triton Model Analyzer: 1.7.0
- Triton Inference Server Client: 2.13.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- Fixed triton-model-config error when tensorrt_capture_cuda_graph flag is not passed
- Dump Conversion Comparator inputs and outputs into JSON files
- Added information in logs on the tolerance parameters values to pass the conversion verification
- Use
count_windows
mode as default option for Perf Analyzer - Added possibility to define custom docker images
- Bugfixes
- Versions of used external components:
- Triton Model Analyzer: 1.6.0
- Triton Inference Server Client: 2.12.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- TensorRT backend acceleration not supported for ONNX Runtime in Triton Inference Server ver. 21.07
- comprehensive refactor of command-line API in order to provide more gradual pipeline steps execution
- Versions of used external components:
- Triton Model Analyzer: 21.05
- tf2onnx: v1.8.5 (support for ONNX opset 13, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.
- Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due
to issue in PyTorch 1.8
- affected NVIDIA PyTorch containers: 20.12, 21.02, 21.03
- workaround: use PyTorch containers newer than 21.03
- possible to define a single profile for TensorRT
- documentation update
- Release of main components:
- Model Converter - converts the model to a set of variants optimized for inference or to be later optimized by Triton Inference Server backend.
- Model Repo Builder - setup Triton Inference Server Model Repository, including its configuration.
- Model Analyzer - select optimal Triton Inference Server configuration based on models compute and memory requirements, available computation infrastructure, and model application constraints.
- Helm Chart Generator - deploy Triton Inference Server and model with optimal configuration to cloud.
- Versions of used external components:
- Triton Model Analyzer: 21.03+616e8a30
- tf2onnx: v1.8.4 (support for ONNX opset 13, tf 1.15 and 2.4)
- Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.
- Known issues
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due
to issue in PyTorch 1.8
- affected NVIDIA PyTorch containers: 20.12, 21.03
- workaround: use containers different from above
- Triton Inference Server stays in the background when the profile process is interrupted by the user