You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
When using a Python backend with multiple model instances and running inference with many identical requests, the results are not deterministic and not even close to the expected result.
Triton Information
24.09
Are you using the Triton container or did you build it yourself?
Triton container (with additional Python libraries)
Expected behavior
I expect the outputs from the Python model to be consistently the same for a request with the same input values.
The locust script in the example repository I created prints the output for every time it differs from the expected output.
Additional Information
I do believe this is an issue with Triton and not with my models since the error doesn't reproduce with instance count of 1.
I tried to avoid using multiple instances and instead used decoupled mode with a ThreadPoolExecutor, which lead to the same problem, even when moving every object initialization to inside the thread worker, to avoid non thread-safe behavior.
When trying to debug with print statements in the compiled models and the Python model, I noticed that sometimes the encoder output seems to have weird values after transferred to the Python model, but the problem seems to reproduce even when this is not the case.
It seems that the issue is less reproduceable when using a dynamic batcher with queue delay, which leads me to believe that it might be related to race condition in some shared memory between the BLS instances.
The text was updated successfully, but these errors were encountered:
Description
When using a Python backend with multiple model instances and running inference with many identical requests, the results are not deterministic and not even close to the expected result.
Triton Information
24.09
Are you using the Triton container or did you build it yourself?
Triton container (with additional Python libraries)
To Reproduce
Clone the following repository and follow the steps in the
README.md
file:https://github.com/NadavShmayo/fairseq-triton-example
Expected behavior
I expect the outputs from the Python model to be consistently the same for a request with the same input values.
The locust script in the example repository I created prints the output for every time it differs from the expected output.
Additional Information
ThreadPoolExecutor
, which lead to the same problem, even when moving every object initialization to inside the thread worker, to avoid non thread-safe behavior.The text was updated successfully, but these errors were encountered: