Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] asr/whisper service slower on Gaudi2 than on Xeon #1018

Open
2 of 6 tasks
daniel-de-leon-user293 opened this issue Dec 9, 2024 · 4 comments
Open
2 of 6 tasks
Assignees
Labels
bug Something isn't working exploration

Comments

@daniel-de-leon-user293
Copy link
Contributor

daniel-de-leon-user293 commented Dec 9, 2024

Priority

P2-High

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2

NOTE: the original Gaudi dockerfile uses Gaudi version 1.18.0. We are currently getting segfault using this version running on our machine.

Description

Following the steps for Gaudi2 from the README, running asr/whisper service is significantly slower than on Xeon. I generated a simple benchmark script that clocks the duration of a requests.post() to the service on all examples in the LibriSpeech test-clean dataset. The plots below show how Gaudi2 performed against an Xeon machine.
whisper_Gaudi_vs_Xeon
whisper_distribution

As file size increased, Gaudi performed slower and than Xeon as seen in the plot below:
whisper_size_vs_inference

Reproduce steps

  1. Follow steps in README to run microservice on Gaudi
  2. Running the example curl in the README (2.2.3 results in an inference time of ~3.6 seconds where as on Xeon it's only taking ~0.5 seconds)
  3. To reproduce results from the plots provided, download the LibriSpeech test-clean dataset
  4. Run whisper_benchmark.py (updating variables as needed)
  5. Compute times are saved as a <EXP_NAME>_0.json

Raw log

Showing only the last 100 lines of output since showing the entire benchmark would show thousands of lines.

--------------------------------------------------------------------------------------------------------
docker logs whisper-server | head -n 100
[WARNING|utils.py:212] 2024-12-06 19:47:49,213 >> optimum-habana v1.14.1 has been validated for SynapseAI v1.18.0 but habana-frameworks v1.16.2.2 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-12-06 19:47:49,423 >> optimum-habana v1.14.1 has been validated for SynapseAI v1.18.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/home/user/.local/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 36
CPU RAM       : 253594120 KB
------------------------------------------------------------------------------
You have passed language=english, but also have set `forced_decoder_ids` to [[1, None], [2, 50359]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of language=english.
[WARNING|logging.py:328] 2024-12-06 19:48:03,952 >> Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7066 (Press CTRL+C to quit)
Downloading model: openai/whisper-small
[ASR] fetch warmup audio...
[ASR] warmup...
[ASR] fetch warmup audio...
[ASR] warmup...
Whisper generation begin.
generated text in 3.6244754791259766 seconds, and the result is: you
INFO:     172.17.0.1:46926 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 3.7294511795043945 seconds, and the result is: you
INFO:     172.17.0.1:43518 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 6.51738977432251 seconds, and the result is: who is pat gelsinger
INFO:     172.17.0.1:50762 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 44.98237657546997 seconds, and the result is: had eva kressweller not been good looking had jack been still at college had sir kennington oval remained in england had mister bunnet and the barkeeper not succeeded in stopping my carriage on the hill should i have succeeded in arranging for the final departure of my old friend
INFO:     172.17.0.1:56802 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 26.094318628311157 seconds, and the result is: as i spoke i made him a gracious bow and i think i showed him by my mode of address that i did not bear any grudge as to my individual self
INFO:     172.17.0.1:33994 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 10.836403608322144 seconds, and the result is: i shall be happy to take charge of them said sir fernando
INFO:     172.17.0.1:49466 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 15.310702323913574 seconds, and the result is: on arriving at home at my own residence i found that our salon was filled with a brilliant company
INFO:     172.17.0.1:44142 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 17.145907163619995 seconds, and the result is: it is founded on the acknowledged weakness of those who survive that period of life at which men cease to work
INFO:     172.17.0.1:45452 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 15.513034105300903 seconds, and the result is: there came upon me a sudden shock when i heard these words which exceeded anything which i had yet felt
INFO:     172.17.0.1:46690 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 8.814271211624146 seconds, and the result is: but i mean to have my innings before long
INFO:     172.17.0.1:44714 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 16.24426245689392 seconds, and the result is: then said sir ferdinando there is nothing for it but that we must take you with him
INFO:     172.17.0.1:32892 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 10.68668270111084 seconds, and the result is: missus neverbend you must indeed be proud of your son
INFO:     172.17.0.1:41096 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 10.744771242141724 seconds, and the result is: sir kennington oval is a very fine player said my wife
INFO:     172.17.0.1:40634 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 7.108684062957764 seconds, and the result is: it is a duty said i
INFO:     172.17.0.1:60062 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 20.52310800552368 seconds, and the result is: but your power is so superior to any that i can advance as to make us here feel that there is no disgrace in yielding to it
INFO:     172.17.0.1:35084 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 10.11953616142273 seconds, and the result is: what would become of your gun were i to kidnap you
INFO:     172.17.0.1:60538 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 5.833545923233032 seconds, and the result is: quite satisfied said eva
INFO:     172.17.0.1:59676 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 15.010395765304565 seconds, and the result is: you hear what sir ferdinand 0 brown has said replied captain battleaxe
INFO:     172.17.0.1:46890 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 17.61302423477173 seconds, and the result is: i was to be taken away and carried to inland or elsewhere or drowned upon the voyage it mattered not which
INFO:     172.17.0.1:47822 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 31.87185502052307 seconds, and the result is: when this captain should have taken himself and his vessel back to england i would retire to a small farm which i possessed at the furthest side of the island and there in seclusion would i end my days
INFO:     172.17.0.1:39818 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 5.101831436157227 seconds, and the result is: today i shouted
INFO:     172.17.0.1:37550 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 21.138211250305176 seconds, and the result is: therefore i feel myself quite able as president of this republic to receive you with a courtesy due to the servants of a friendly ally
INFO:     172.17.0.1:44492 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 35.88362431526184 seconds, and the result is: you will carry out with you 100 men of the north northwest birmingham regiment which will probably suffice for your own security as it is thought that if mister neverbend be withdrawn the people will revert easily to their old habits of obedience
INFO:     172.17.0.1:46832 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 6.956258296966553 seconds, and the result is: your power is sufficient i said
INFO:     172.17.0.1:50114 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 11.856374502182007 seconds, and the result is: we sat with the officer some little time after dinner and then went ashore
INFO:     172.17.0.1:50118 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 34.29918432235718 seconds, and the result is: i can afford to smile because i am absolutely powerless before you but i do not the less feel that in a matter of which the progress of the world is concerned i or rather we have been put down by brute force
INFO:     172.17.0.1:49782 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 25.549216747283936 seconds, and the result is: if you will give us your promise to meet captain adelaix here at this time tomorrow we will stretch a and delay the departure of the john bright for 24 hours
INFO:     172.17.0.1:49640 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 10.137414932250977 seconds, and the result is: no doubt in process of time the ladies will follow
INFO:     172.17.0.1:42192 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 23.77631688117981 seconds, and the result is: i and my wife and son and the 2 cresswellers and 3 or 4 others agreed to dine on board the ship on the next
INFO:     172.17.0.1:42250 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 10.660054683685303 seconds, and the result is: one of us always remains on board while the other is on shore
INFO:     172.17.0.1:39982 - "POST /v1/asr HTTP/1.1" 200 OK
Whisper generation begin.
generated text in 15.79270076751709 seconds, and the result is: you have received us with all that courtesy and hospitality for which your character in england stands so high
INFO:     172.17.0.1:48954 - "POST /v1/asr HTTP/1.1" 200 OK

Attachments

Below is the whisper_benchmark.py script used to gather results for the plots.

import base64
import json
import os
import urllib.request
import uuid
import requests
from time import time

DATASET_PATH = "/path/to/LibriSpeech/test-clean"
EXP_NAME = "whisper_gaudi_librispeech_benchmark"
ENDPOINT = "http://localhost:9099/v1/audio/transcriptions"

class WhisperBM:
    def __init__(self, endpoint, dataset_name, dataset_path):
        self.endpoint = endpoint
        self.dataset_name = dataset_name
        self.dataset_path = dataset_path
        self.compute_times = dict()
        if self.dataset_name == 'librispeech':
            self.run_bm = self.ls_bm
        elif self.dataset_name == 'single':
            self.run_bm = self.single_bm
    
    def single_bm(self):
        #WIP
        pass
        
    def ls_bm(self):
        for i in os.listdir(self.dataset_path):
            for j in os.listdir(os.path.join(self.dataset_path, i)):
                for k in os.listdir(os.path.join(self.dataset_path, i, j)):

                    if k.endswith('.flac'):
                        fname = os.path.join(self.dataset_path, i, j, k)
                        with open(fname, "rb") as f:
                            test_audio_base64_str = base64.b64encode(f.read()).decode("utf-8")

                        inputs = {"byte_str": test_audio_base64_str}

                        start = time()
                        response = requests.post(url=self.endpoint, data=json.dumps(inputs), proxies={"http": None})
                        end = time() 
                        self.compute_times[k] = end - start
                        print(k, self.compute_times[k])

    def save_results(self, exp_name):
        # Save the dictionary to a file
        output_fname = f"{exp_name}_0"
        while os.path.exists(f'{output_fname}.json'):
            output_fname = output_fname[:-1] + str(int(output_fname[-1]) + 1)

        with open(f"{output_fname}.json", "w") as f:
            json.dump(self.compute_times, f)


def main():
    benchmark = WhisperBM(ENDPOINT, 
                          'librispeech',
                          DATASET_PATH)
    benchmark.run_bm()
    benchmark.save_results(EXP_NAME)


if __name__ == "__main__":
    main()
@daniel-de-leon-user293 daniel-de-leon-user293 added the bug Something isn't working label Dec 9, 2024
@daniel-de-leon-user293
Copy link
Contributor Author

@MSCetin37 @ashahba

@Spycsh
Copy link
Member

Spycsh commented Dec 10, 2024

On my Gaudi the perf for whisper should be similar/faster than Xeon. I suspect there are some setting/env gap that break the HPU static shape generation on your machine to make it look super slow.

@daniel-de-leon-user293
Copy link
Contributor Author

Thank you for your insight @Spycsh. We've tried multiple variations of driver version and Gaudi container versions (1.16.2. & 1.18.0) to no avail. We also tried on Tiber cloud but got an error on the server's docker build:

u9bc7efa9450f6ca15a0524991f18fcc@idc-training-gaudi-compute-02:~/GenAIComps$ docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile.intel_hpu .
[+] Building 48.4s (7/10)                                                                                                                                                                                                                             docker:default
 => [internal] load build definition from Dockerfile.intel_hpu                                                                                                                                                                                                  0.0s
 => => transferring dockerfile: 1.03kB                                                                                                                                                                                                                          0.0s
 => [internal] load metadata for vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest                                                                                                                                      0.4s
 => [internal] load .dockerignore                                                                                                                                                                                                                               0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                 0.0s
 => [1/6] FROM vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest@sha256:71950884006c2b8da31621d6f97aadb20348efa3352a8d50f1fa4d326fa7d740                                                                               43.7s
 => => resolve vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest@sha256:71950884006c2b8da31621d6f97aadb20348efa3352a8d50f1fa4d326fa7d740                                                                                0.0s
 => => sha256:82b7e1a2b2010aa868eeafc6ee97e8f3e6aa692564e57f5492d1892e0383afee 5.61MB / 5.61MB                                                                                                                                                                  0.3s
 => => sha256:71950884006c2b8da31621d6f97aadb20348efa3352a8d50f1fa4d326fa7d740 3.47kB / 3.47kB                                                                                                                                                                  0.0s
 => => sha256:7ca0479d85cec765378679ea743380eb5ab02753097f613bbe71fe2796ad1058 16.97kB / 16.97kB                                                                                                                                                                0.0s
 => => sha256:6d23d53c04d9e8d586c18ab6624921203a8afe29c25d18e7348b8595e35e672e 326.20MB / 326.20MB                                                                                                                                                              4.9s
 => => sha256:6414378b647780fee8fd903ddb9541d134a1947ce092d08bdeb23a54cb3684ac 29.54MB / 29.54MB                                                                                                                                                                0.5s
 => => sha256:28460986453d687cf230650f09697d7dc511330acbab5ec4500b3a3e0f022d6b 20.78MB / 20.78MB                                                                                                                                                                0.8s
 => => sha256:72b39188fcb7c58b350d86fcab412de955e89526e2ce240b3748bc4cf07a80a9 660B / 660B                                                                                                                                                                      0.4s
 => => extracting sha256:6414378b647780fee8fd903ddb9541d134a1947ce092d08bdeb23a54cb3684ac                                                                                                                                                                       0.9s
 => => sha256:e840f17d225844b9eec42c25b37ec80fedf8e2019a8c5be6f9cc04ee500d5c17 550.24MB / 550.24MB                                                                                                                                                              7.6s
 => => sha256:2f274a438b61ed48a9f49d25f50837cfc609a17b51ce5b86ab04c1c967647056 2.05MB / 2.05MB                                                                                                                                                                  0.9s
 => => sha256:0b2562e8f57189bff4f2a434c8cb95cdb1d1f07fed8163c2a5582f424465f752 9.28kB / 9.28kB                                                                                                                                                                  0.9s
 => => sha256:0e2422e97668f7d079d2f2544f07d04b1ea71641a54c6a5958a00575719dfb71 40.32MB / 40.32MB                                                                                                                                                                1.5s
 => => sha256:1254eee371d72d1785041ecb41f3e8bfe072271126a670bbc30d520959253f3f 630.47MB / 630.47MB                                                                                                                                                             10.5s
 => => sha256:35202f26fc6c37c499c4ca613f912be80cc50c673526a46a040791a81ddbba93 3.65kB / 3.65kB                                                                                                                                                                  1.5s
 => => sha256:45c9a46f3026639835ae54bcca9527c9e10b7d736e91560e01d9d56af8324c31 2.41kB / 2.41kB                                                                                                                                                                  4.9s
 => => extracting sha256:6d23d53c04d9e8d586c18ab6624921203a8afe29c25d18e7348b8595e35e672e                                                                                                                                                                       6.7s
 => => sha256:e922f07b1855839a7aa66b42186cc6948977bbd667f6065658d9203e2a1cf6c9 624B / 624B                                                                                                                                                                      4.9s
 => => sha256:0bb2ea8424789e12a49955a0a968bac5a050019ac9d83029e81aabb623efba4b 357.78MB / 357.78MB                                                                                                                                                             13.0s
 => => sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 32B / 32B                                                                                                                                                                        7.6s
 => => extracting sha256:82b7e1a2b2010aa868eeafc6ee97e8f3e6aa692564e57f5492d1892e0383afee                                                                                                                                                                       0.4s
 => => extracting sha256:72b39188fcb7c58b350d86fcab412de955e89526e2ce240b3748bc4cf07a80a9                                                                                                                                                                       0.0s
 => => extracting sha256:28460986453d687cf230650f09697d7dc511330acbab5ec4500b3a3e0f022d6b                                                                                                                                                                       1.2s
 => => extracting sha256:e840f17d225844b9eec42c25b37ec80fedf8e2019a8c5be6f9cc04ee500d5c17                                                                                                                                                                       9.5s
 => => extracting sha256:2f274a438b61ed48a9f49d25f50837cfc609a17b51ce5b86ab04c1c967647056                                                                                                                                                                       0.0s
 => => extracting sha256:0b2562e8f57189bff4f2a434c8cb95cdb1d1f07fed8163c2a5582f424465f752                                                                                                                                                                       0.0s
 => => extracting sha256:0e2422e97668f7d079d2f2544f07d04b1ea71641a54c6a5958a00575719dfb71                                                                                                                                                                       0.6s
 => => extracting sha256:35202f26fc6c37c499c4ca613f912be80cc50c673526a46a040791a81ddbba93                                                                                                                                                                       0.0s
 => => extracting sha256:1254eee371d72d1785041ecb41f3e8bfe072271126a670bbc30d520959253f3f                                                                                                                                                                       8.8s
 => => extracting sha256:45c9a46f3026639835ae54bcca9527c9e10b7d736e91560e01d9d56af8324c31                                                                                                                                                                       0.0s
 => => extracting sha256:e922f07b1855839a7aa66b42186cc6948977bbd667f6065658d9203e2a1cf6c9                                                                                                                                                                       0.0s
 => => extracting sha256:0bb2ea8424789e12a49955a0a968bac5a050019ac9d83029e81aabb623efba4b                                                                                                                                                                      10.5s
 => => extracting sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1                                                                                                                                                                       0.0s
 => [internal] load build context                                                                                                                                                                                                                               0.2s
 => => transferring context: 22.14MB                                                                                                                                                                                                                            0.2s
 => [2/6] RUN useradd -m -s /bin/bash user &&     mkdir -p /home/user &&     chown -R user /home/user/                                                                                                                                                          4.0s
 => ERROR [3/6] RUN apt-get update     && apt-get install -y ffmpeg                                                                                                                                                                                             0.2s
------
 > [3/6] RUN apt-get update     && apt-get install -y ffmpeg:
------
Dockerfile.intel_hpu:17
--------------------
  16 |     # Install system dependencies
  17 | >>> RUN apt-get update \
  18 | >>>     && apt-get install -y ffmpeg
  19 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get update     && apt-get install -y ffmpeg" did not complete successfully: exit code: 137

Could you share more about your settings and/or env so we can try to reproduce on our end?

Thank you!

@Spycsh
Copy link
Member

Spycsh commented Jan 10, 2025

@daniel-de-leon-user293 I've never used Tiber cloud before and for the build error I think you can check whether there are resource limitations in your Docker build, or whether the proxy is correct. Can you also build opea/whisper:latest xeon image on the cloud?

My HPU setting is

+-----------------------------------------------------------------------------+
| HL-SMI Version:                              hl-1.19.0-fw-57.1.0.0          |
| Driver Version:                                     1.19.0-2427ed8          |
|-------------------------------+----------------------+----------------------+

And my cpu is Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz

If you still found performance issue I think the best way it to enter into the container and run python whisper_model.py link directly and see the latency. HPU needs a careful warmup and normally the first few rounds are slow and the following few rounds are quick. If you found all runs are slow, please paste the log here including the hardware settings and let's see whether there are anything missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exploration
Projects
None yet
Development

No branches or pull requests

3 participants