-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] asr/whisper service slower on Gaudi2 than on Xeon #1018
Comments
On my Gaudi the perf for whisper should be similar/faster than Xeon. I suspect there are some setting/env gap that break the HPU static shape generation on your machine to make it look super slow. |
Thank you for your insight @Spycsh. We've tried multiple variations of driver version and Gaudi container versions (1.16.2. & 1.18.0) to no avail. We also tried on Tiber cloud but got an error on the server's docker build:
Could you share more about your settings and/or env so we can try to reproduce on our end? Thank you! |
@daniel-de-leon-user293 I've never used Tiber cloud before and for the build error I think you can check whether there are resource limitations in your Docker build, or whether the proxy is correct. Can you also build My HPU setting is
And my cpu is If you still found performance issue I think the best way it to enter into the container and run |
Priority
P2-High
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2
NOTE: the original Gaudi dockerfile uses Gaudi version 1.18.0. We are currently getting segfault using this version running on our machine.
Description
Following the steps for Gaudi2 from the README, running asr/whisper service is significantly slower than on Xeon. I generated a simple benchmark script that clocks the duration of a
requests.post()
to the service on all examples in the LibriSpeech test-clean dataset. The plots below show how Gaudi2 performed against an Xeon machine.As file size increased, Gaudi performed slower and than Xeon as seen in the plot below:
Reproduce steps
curl
in the README (2.2.3 results in an inference time of ~3.6 seconds where as on Xeon it's only taking ~0.5 seconds)whisper_benchmark.py
(updating variables as needed)<EXP_NAME>_0.json
Raw log
Attachments
Below is the
whisper_benchmark.py
script used to gather results for the plots.The text was updated successfully, but these errors were encountered: