Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastapi batch inference server demo. Fixes OpenNMT/CTranslate2/issues… #2489

Closed
wants to merge 1 commit into from

Conversation

dongxiaolong
Copy link

Based on the discussion in OpenNMT/CTranslate2#1140, I've submitted a simple batch translation demo using fastapi. This code draws inspiration from both @hobodrifterdavid and ChatGPT. After performance testing, I found it to run efficiently. We plan to deploy it in our initial products and hope to open-source it, allowing the community to use and enhance it together.

@vince62s
Copy link
Member

Hello @dongxiaolong thanks for this submission !
As is, I think a better place would be in the CTranslate2 repo under a new sub-folder of examples.
However, I had in mind to revamp the OpenNMT-py server to make it clearer, more modular and fastapi compatible.

In this scenario it would require more work. Please have a look at the following files:
onmt/translate/translation_server.py
onmt/bin/server.py

The server.py is the frontend using flask and waitress (could be replaced by fastapi / uvicorn) and the other file is the logic using either OpenNMT-py or CTranslate2 for inference.
Code is quite old but should work and needs to be revamped a bit, especially with the new inference_engine stuff.

@dongxiaolong
Copy link
Author

dongxiaolong commented Oct 13, 2023

Hello @dongxiaolong thanks for this submission ! As is, I think a better place would be in the CTranslate2 repo under a new sub-folder of examples. However, I had in mind to revamp the OpenNMT-py server to make it clearer, more modular and fastapi compatible.

In this scenario it would require more work. Please have a look at the following files: onmt/translate/translation_server.py onmt/bin/server.py

The server.py is the frontend using flask and waitress (could be replaced by fastapi / uvicorn) and the other file is the logic using either OpenNMT-py or CTranslate2 for inference. Code is quite old but should work and needs to be revamped a bit, especially with the new inference_engine stuff.

I apologize for the delayed response. I remember you mentioned in a previous issue about wanting to enhance the OpenNMT-py server, so I made this submission here. Both CTranslate2 and OpenNMT are indeed outstanding projects. They support various task models. I also hope they can introduce more features, like FastChat and Vllm, bringing in functionalities such as continuous batch. This would allow users to deploy more conveniently and implement them rapidly in real-world products. I'm eagerly looking forward to seeing the updated server soon and collaborating with the community to refine it. Thank you once again for your contributions and prompt response.

@rakesh-krishna
Copy link

thanks for this code, the performace 3x because of the request batching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants