Fully-working example with dynamic batching #71

mbahri · 2024-03-11T09:58:38Z

Hi

Thanks a lot for providing this backend. I have tried to use it and I have had some trouble getting Triton to load and run my OpenVINO models.

I found that the backend correctly attempts to load models if the files are just named model.bin and model.xml, in other cases the backend throws an exception. However, the main issue for me now is using the dynamic batching.

It would be very helpful if you could provide a fully working example of how to configure dynamic batching, with values for the different parameters that need to be set.

Related question: the backend doesn't support dynamic axes and one of the parameters mentioned for dynamic batching is about padding batches. Does this mean the backend will pad batches to the max batch size for now?

The text was updated successfully, but these errors were encountered:

dtrawins · 2024-03-12T10:43:07Z

Once the PR #72 is merged it will be possible to use the models with dynamic shape. Note that with the dynamic shape on the model input, you don't need to use dynamic batching.
If you want to use arbitrary batch size or image resolution you will be able to do it with the model of shape like [-1,-1,-1,3].
If your goal is to improve throughput, you can use multiple instances with parallel execution (check the throughput mode example)

mbahri · 2024-03-12T11:03:44Z

Hi, thanks for your reply. I think I might be a bit confused here, but could you explain why I don't need to use dynamic batching?

The way I thought it worked was that with dynamic batching enabled, Triton waits a predefined amount of time to group requests together in a batch, which would mean batch size could be 3, then 1, then 5, etc.

When using dynamic batching with other backends like ONNX, I've needed to set the input dimension to, for example, [3, 224, 224] - and have the model accept [-1, 3, 224, 224]

Does it work differently with OpenVINO?

I've used parallel model execution in combination with dynamic batching with ONNX before and needed to tune the number of threads each model instance could use to avoid overloading the CPU. Is it done differently with OpenVINO?

dtrawins · 2024-03-13T13:59:53Z

@mbahri You could use the dynamic batching but it will not of top efficiency - it will still use the batch padding. You can expect better throughput results by using parallel execution with multi instance configuration is setting the parameter NUM_STREAMS. That way you will not observe cpu overloading. The parameter NUM_STREAMS will handle threads management in parallel execution.
To sum up with the PR I mentioned you will be able to deploy models with shape [-1, 3, 224, 224] or [-1, 3, -1, -1]. If you want to improve the throughput for parallel execution from many clients, I recommend using several instances with several NUM_STREAMS (they should match).
Removing the padding will be dropped probably later but still similar throughput gain is expected from parallel execution.

mbahri · 2024-03-13T18:38:33Z

thanks @dtrawins , so to confirm, with parallel model execution and setting NUM_STREAMS, I would just use a batch size of 1 for each model instance?

Its-astonishing · 2024-12-11T15:49:30Z

Hi, @dtrawins , is dynamic batching supported by OV backend? It seems that dynamic batch scheduler always puts infer requests one by one. Inside ModelInstanceState::ProcessRequests function, the request_count argument always equals to 1 and the input is simply padded up to max_batch_size no matter how many cuncurrent requests were sent.

UPD:
I enabled verbose logging and observed that dynamic bathing feature actually works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully-working example with dynamic batching #71

Fully-working example with dynamic batching #71

mbahri commented Mar 11, 2024

dtrawins commented Mar 12, 2024

mbahri commented Mar 12, 2024

dtrawins commented Mar 13, 2024

mbahri commented Mar 13, 2024

Its-astonishing commented Dec 11, 2024 •

edited

Loading

Fully-working example with dynamic batching #71

Fully-working example with dynamic batching #71

Comments

mbahri commented Mar 11, 2024

dtrawins commented Mar 12, 2024

mbahri commented Mar 12, 2024

dtrawins commented Mar 13, 2024

mbahri commented Mar 13, 2024

Its-astonishing commented Dec 11, 2024 • edited Loading

Its-astonishing commented Dec 11, 2024 •

edited

Loading