-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully-working example with dynamic batching #71
Comments
Once the PR #72 is merged it will be possible to use the models with dynamic shape. Note that with the dynamic shape on the model input, you don't need to use dynamic batching. |
Hi, thanks for your reply. I think I might be a bit confused here, but could you explain why I don't need to use dynamic batching? The way I thought it worked was that with dynamic batching enabled, Triton waits a predefined amount of time to group requests together in a batch, which would mean batch size could be 3, then 1, then 5, etc. When using dynamic batching with other backends like ONNX, I've needed to set the input dimension to, for example, [3, 224, 224] - and have the model accept [-1, 3, 224, 224] Does it work differently with OpenVINO? I've used parallel model execution in combination with dynamic batching with ONNX before and needed to tune the number of threads each model instance could use to avoid overloading the CPU. Is it done differently with OpenVINO? |
@mbahri You could use the dynamic batching but it will not of top efficiency - it will still use the batch padding. You can expect better throughput results by using parallel execution with multi instance configuration is setting the parameter NUM_STREAMS. That way you will not observe cpu overloading. The parameter NUM_STREAMS will handle threads management in parallel execution. |
thanks @dtrawins , so to confirm, with parallel model execution and setting NUM_STREAMS, I would just use a batch size of 1 for each model instance? |
Hi, @dtrawins , is dynamic batching supported by OV backend? It seems that dynamic batch scheduler always puts infer requests one by one. Inside UPD: |
Hi
Thanks a lot for providing this backend. I have tried to use it and I have had some trouble getting Triton to load and run my OpenVINO models.
I found that the backend correctly attempts to load models if the files are just named
model.bin
andmodel.xml
, in other cases the backend throws an exception. However, the main issue for me now is using the dynamic batching.It would be very helpful if you could provide a fully working example of how to configure dynamic batching, with values for the different parameters that need to be set.
Related question: the backend doesn't support dynamic axes and one of the parameters mentioned for dynamic batching is about padding batches. Does this mean the backend will pad batches to the max batch size for now?
The text was updated successfully, but these errors were encountered: