Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully-working example with dynamic batching #71

Open
mbahri opened this issue Mar 11, 2024 · 5 comments
Open

Fully-working example with dynamic batching #71

mbahri opened this issue Mar 11, 2024 · 5 comments

Comments

@mbahri
Copy link

mbahri commented Mar 11, 2024

Hi

Thanks a lot for providing this backend. I have tried to use it and I have had some trouble getting Triton to load and run my OpenVINO models.

I found that the backend correctly attempts to load models if the files are just named model.bin and model.xml, in other cases the backend throws an exception. However, the main issue for me now is using the dynamic batching.

It would be very helpful if you could provide a fully working example of how to configure dynamic batching, with values for the different parameters that need to be set.

Related question: the backend doesn't support dynamic axes and one of the parameters mentioned for dynamic batching is about padding batches. Does this mean the backend will pad batches to the max batch size for now?

@dtrawins
Copy link
Collaborator

Once the PR #72 is merged it will be possible to use the models with dynamic shape. Note that with the dynamic shape on the model input, you don't need to use dynamic batching.
If you want to use arbitrary batch size or image resolution you will be able to do it with the model of shape like [-1,-1,-1,3].
If your goal is to improve throughput, you can use multiple instances with parallel execution (check the throughput mode example)

@mbahri
Copy link
Author

mbahri commented Mar 12, 2024

Hi, thanks for your reply. I think I might be a bit confused here, but could you explain why I don't need to use dynamic batching?

The way I thought it worked was that with dynamic batching enabled, Triton waits a predefined amount of time to group requests together in a batch, which would mean batch size could be 3, then 1, then 5, etc.

When using dynamic batching with other backends like ONNX, I've needed to set the input dimension to, for example, [3, 224, 224] - and have the model accept [-1, 3, 224, 224]

Does it work differently with OpenVINO?

I've used parallel model execution in combination with dynamic batching with ONNX before and needed to tune the number of threads each model instance could use to avoid overloading the CPU. Is it done differently with OpenVINO?

@dtrawins
Copy link
Collaborator

@mbahri You could use the dynamic batching but it will not of top efficiency - it will still use the batch padding. You can expect better throughput results by using parallel execution with multi instance configuration is setting the parameter NUM_STREAMS. That way you will not observe cpu overloading. The parameter NUM_STREAMS will handle threads management in parallel execution.
To sum up with the PR I mentioned you will be able to deploy models with shape [-1, 3, 224, 224] or [-1, 3, -1, -1]. If you want to improve the throughput for parallel execution from many clients, I recommend using several instances with several NUM_STREAMS (they should match).
Removing the padding will be dropped probably later but still similar throughput gain is expected from parallel execution.

@mbahri
Copy link
Author

mbahri commented Mar 13, 2024

thanks @dtrawins , so to confirm, with parallel model execution and setting NUM_STREAMS, I would just use a batch size of 1 for each model instance?

@Its-astonishing
Copy link

Its-astonishing commented Dec 11, 2024

Hi, @dtrawins , is dynamic batching supported by OV backend? It seems that dynamic batch scheduler always puts infer requests one by one. Inside ModelInstanceState::ProcessRequests function, the request_count argument always equals to 1 and the input is simply padded up to max_batch_size no matter how many cuncurrent requests were sent.

UPD:
I enabled verbose logging and observed that dynamic bathing feature actually works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants