Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added whisperX support #125

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Added whisperX support #125

wants to merge 8 commits into from

Conversation

DennisTheD
Copy link

I added support for the whisperX engine.
The engine can be activated by setting the ASR_ENGINE to "whisperx". In order to use the diarization pipeline, a Huggingface access token needs to be supplied, using the "HF_TOKEN" variable. You also need to accept some user agreements (see https://github.com/m-bain/whisperX for further details). If you do not need diarization, the token is not required.

@ayancey
Copy link
Collaborator

ayancey commented Sep 26, 2023

Love it! Will test this when I can.

@ayancey
Copy link
Collaborator

ayancey commented Oct 4, 2023

I tested it and was able to get it working. Great work! Please take a look at the changes in 1.2 and fix merge conflicts and I will approve it. 👍

@DennisTheD
Copy link
Author

I updated the code to resolve the merge conflicts. Since the documentation was moved from the Readme, i need to update the documentation accordingly. So this PR is not yet ready to get merged.

@dahifi
Copy link

dahifi commented Oct 8, 2023

I'm trying to test on my end, cloned your repo and failed to build on my Macbook. Will try on another machine shortly.

42.84 Building wheels for collected packages: antlr4-python3-runtime, docopt, julius, psutil, ruamel.yaml.clib
42.84   Building wheel for antlr4-python3-runtime (pyproject.toml): started
43.00   Building wheel for antlr4-python3-runtime (pyproject.toml): finished with status 'done'
43.00   Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=7f5454ecd9008d2b061876f05291a060ecfa370fbff31c2d359a4584ab11d6e4
43.00   Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88
43.00   Building wheel for docopt (pyproject.toml): started
43.11   Building wheel for docopt (pyproject.toml): finished with status 'done'
43.11   Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13705 sha256=a6d443258a1b8ab52eb345321238fb4d53214297b9f8733e31cd348eea265945
43.11   Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
43.11   Building wheel for julius (pyproject.toml): started
43.22   Building wheel for julius (pyproject.toml): finished with status 'done'
43.22   Created wheel for julius: filename=julius-0.2.7-py3-none-any.whl size=21868 sha256=61232d4bf4b2d6a642c6f91c03ebf9248d58cd699b221b49e3f3faf03ddee1ce
43.22   Stored in directory: /root/.cache/pip/wheels/b9/b2/05/f883527ffcb7f2ead5438a2c23439aa0c881eaa9a4c80256f4
43.22   Building wheel for psutil (pyproject.toml): started
43.34   Building wheel for psutil (pyproject.toml): finished with status 'error'
43.35   error: subprocess-exited-with-error
43.35   
43.35   × Building wheel for psutil (pyproject.toml) did not run successfully.
43.35   │ exit code: 1
43.35   ╰─> [43 lines of output]
43.35       running bdist_wheel
43.35       running build
43.35       running build_py
43.35       creating build
43.35       creating build/lib.linux-aarch64-cpython-310
43.35       creating build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_pslinux.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_compat.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_pswindows.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_common.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psposix.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_pssunos.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psaix.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/__init__.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psosx.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psbsd.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       creating build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_memleaks.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/runner.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_misc.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_testutils.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_connections.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_posix.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_bsd.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/__main__.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_aix.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_sunos.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_process.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_linux.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_windows.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/__init__.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_contracts.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_osx.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_unicode.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_system.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       running build_ext
43.35       building 'psutil._psutil_linux' extension
43.35       creating build/temp.linux-aarch64-cpython-310
43.35       creating build/temp.linux-aarch64-cpython-310/psutil
43.35       gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_SIZEOF_PID_T=4 -DPSUTIL_VERSION=595 -DPy_LIMITED_API=0x03060000 -DPSUTIL_ETHTOOL_MISSING_TYPES=1 -DPSUTIL_LINUX=1 -I/app/.venv/include -I/usr/local/include/python3.10 -c psutil/_psutil_common.c -o build/temp.linux-aarch64-cpython-310/psutil/_psutil_common.o
43.35       psutil could not be installed from sources because gcc is not installed. Try running:
43.35         sudo apt-get install gcc python3-dev
43.35       error: command 'gcc' failed: No such file or directory
43.35       [end of output]
43.35   
43.35   note: This error originates from a subprocess, and is likely not a problem with pip.
43.35   ERROR: Failed building wheel for psutil
43.35   Building wheel for ruamel.yaml.clib (pyproject.toml): started
43.44   Building wheel for ruamel.yaml.clib (pyproject.toml): finished with status 'error'
43.45   error: subprocess-exited-with-error
43.45   
43.45   × Building wheel for ruamel.yaml.clib (pyproject.toml) did not run successfully.
43.45   │ exit code: 1
43.45   ╰─> [16 lines of output]
43.45       running bdist_wheel
43.45       running build
43.45       running build_py
43.45       creating build
43.45       creating build/lib.linux-aarch64-cpython-310
43.45       creating build/lib.linux-aarch64-cpython-310/ruamel
43.45       creating build/lib.linux-aarch64-cpython-310/ruamel/yaml
43.45       creating build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       copying ./setup.py -> build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       copying ./__init__.py -> build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       copying ./LICENSE -> build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       running build_ext
43.45       building '_ruamel_yaml' extension
43.45       creating build/temp.linux-aarch64-cpython-310
43.45       gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/app/.venv/include -I/usr/local/include/python3.10 -c _ruamel_yaml.c -o build/temp.linux-aarch64-cpython-310/_ruamel_yaml.o
43.45       error: command 'gcc' failed: No such file or directory
43.45       [end of output]
43.45   
43.45   note: This error originates from a subprocess, and is likely not a problem with pip.
43.45   ERROR: Failed building wheel for ruamel.yaml.clib
43.45 Successfully built antlr4-python3-runtime docopt julius
43.45 Failed to build psutil ruamel.yaml.clib
43.45 ERROR: Could not build wheels for psutil, ruamel.yaml.clib, which is required to install pyproject.toml-based projects
------
Dockerfile:28
--------------------
  26 |     RUN poetry install
  27 |     
  28 | >>> RUN $POETRY_VENV/bin/pip install pandas transformers nltk pyannote.audio
  29 |     RUN git clone --depth 1 https://github.com/m-bain/whisperX.git \
  30 |         && cd whisperX \
--------------------
ERROR: failed to solve: process "/bin/sh -c $POETRY_VENV/bin/pip install pandas transformers nltk pyannote.audio" did not complete successfully: exit code: 1

@ayancey
Copy link
Collaborator

ayancey commented Oct 8, 2023

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

Copy link

@dahifi dahifi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this failed on my Macbook and you two aren't seeing it...

@@ -7,6 +7,7 @@ RUN export DEBIAN_FRONTEND=noninteractive \
&& apt-get -qq update \
&& apt-get -qq install --no-install-recommends \
ffmpeg \
git \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add gcc and python3-dev packages here

    gcc \
    python3-dev \

Dockerfile.gpu Outdated
@@ -11,6 +11,7 @@ RUN export DEBIAN_FRONTEND=noninteractive \
python${PYTHON_VERSION}-venv \
python3-pip \
ffmpeg \
git \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here as well...

    gcc \
    python3-dev \

@dahifi
Copy link

dahifi commented Oct 8, 2023

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

This was within the docker container. I haven't tried running it natively. I added notes and changes.

Thanks for this.

@ayancey
Copy link
Collaborator

ayancey commented Oct 8, 2023

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

This was within the docker container. I haven't tried running it natively. I added notes and changes.

Thanks for this.

Oh, apologies. I'll look into this for you and try testing again. I don't know how it built for me without gcc.

@DennisTheD
Copy link
Author

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

This was within the docker container. I haven't tried running it natively. I added notes and changes.

Thanks for this.

I can replicate your issue when building the docker image on Apples M1. This is probably related to an missing precompiled python wheel, causing the arm architecture to require a compiler on build. While your suggested fix solves the build issue for me, i still run into issues when trying to transcribe an MP3, causing a crash of the Docker container:

[2023-10-09 10:44:49 +0000] [31] [INFO] Started server process [31]

[2023-10-09 10:44:49 +0000] [31] [INFO] Waiting for application startup.

[2023-10-09 10:44:49 +0000] [31] [INFO] Application startup complete.

[2023-10-09 10:45:29 +0000] [1] [WARNING] Worker with pid 31 was terminated due to signal 11

[2023-10-09 10:45:29 +0000] [55] [INFO] Booting worker with pid: 55

/app/.venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.

torchaudio.set_audio_backend("soundfile")

/app/.venv/lib/python3.10/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.

torchaudio.set_audio_backend("soundfile")

torchvision is not available - cannot save figures

[2023-10-09 10:45:32 +0000] [55] [INFO] Started server process [55]

[2023-10-09 10:45:32 +0000] [55] [INFO] Waiting for application startup.

[2023-10-09 10:45:32 +0000] [55] [INFO] Application startup complete

This issue persists, even when setting the ASR_ENGINE to openai_whisper, but not when using onerahmet/openai-whisper-asr-webservice:latest as base image.
@dahifi can you replicate this issue on your side, or does the image work when using your suggested fix?

@dahifi
Copy link

dahifi commented Oct 9, 2023

I have successfully run previous versions of the ASR engine, in Docker containers, on both the M1 and WSL Cuda.

Last night, on my WSL box, I attempted running the DennisTheD:main image, and am able to use the swagger interface to render a test file using the whisper x engine. Diarization tests using txt output rendered the transcript, without diarization notations. It did not use cuda, but the CPU instead. Attempts at trying diarization with other file format caused an exception in the SRT/VTT export, I don't recall which one.

What is it you need me to validate? M1 native or Docker?

@AustinSaintAubin
Copy link

Tested with docker with GPU. Standard transcriptions work without diarization (diarize=false).
However diarization (diarize=true, min=1, max=3) fails with Response body: Internal Server Error. Looking at the logs indicates an issue with NameError: name 'diarize_model' is not defined.


Testing Prep

# Working Dir
WORKING_DIRECTORY="/mnt/user/docker/whisper-asr-webservice"
mkdir -p "${WORKING_DIRECTORY}"
cd ${WORKING_DIRECTORY}

# Make Folders & Files
mkdir -p  ./cache/{pip,poetry,whisper,faster-whisper}
ls -alt ${WORKING_DIRECTORY}/cache

# Clone Repository 
git clone https://github.com/DennisTheD/whisper-asr-webservice.git whisper-asr-webservice_DennisTheD

# https://github.com/ahmetoner/whisper-asr-webservice/pull/125
# NOTE: The engine can be activated by setting the ASR_ENGINE to "whisperx". In order to use the diarization pipeline, a Huggingface access token needs to be supplied, using the "HF_TOKEN" variable. You also need to accept some user agreements (see https://github.com/m-bain/whisperX for further details). If you do not need diarization, the token is not required.
cd whisper-asr-webservice_DennisTheD/
# git clean -fd
# git reset --hard
git pull
cd ..

Docker Copose File

version: "3.4"

services:

  whisper-asr-webservice-x-gpu:
    build:
      context: ./whisper-asr-webservice_DennisTheD
      dockerfile: Dockerfile.gpu
    # image: onerahmet/openai-whisper-asr-webservice:latest  #v1.0.6 #onerahmet/openai-whisper-asr-webservice:v1.0.6-gpu #v1.1.0-gpu   #latest-gpu
    container_name: whisper-asr-webservice_x_gpu
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - ASR_ENGINE=whisperx
      - ASR_MODEL=large # large-v2 # medium.en
      - HOST_OS="Unraid"
      - HOST_HOSTNAME="UnRAID-02"
      - HOST_CONTAINERNAME="whisper-asr-webservice_x_cpu"
    labels:
      - "net.unraid.docker.managed=dockerman"
      - "net.unraid.docker.description=Whisper ASR Webservice is a general-purpose speech recognition webservice."
      - "net.unraid.docker.webui=http://[IP]:[PORT:9008]/"
      - "net.unraid.docker.icon=https://res.cloudinary.com/apideck/image/upload/v1667440836/marketplaces/ckhg56iu1mkpc0b66vj7fsj3o/listings/14957082_wyd29r.png"
    ports:
      - 9008:9000
    volumes:
      # - ./app:/app/app
      - cache-pip:/root/.cache/pip
      - cache-poetry:/root/.cache/poetry
      - cache-whisper:/root/.cache/whisper # "/mnt/user/docker/whisper-asr-webservice/cache:/root/.cache/whisper"
      - cache-faster-whisper:/root/.cache/faster_whisper

volumes:
  # cache-pip:
  # cache-poetry:
  # cache-whisper:
  # cache-faster-whisper:
  cache-pip:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/pip
  cache-poetry:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/poetry
  cache-whisper:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/whisper
  cache-faster-whisper:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/faster-whisper

Docker Build & Run

docker-compose pull
DOCKER_BUILDKIT=1 docker-compose build --no-cache
docker-compose down --volumes
docker-compose up --detach --remove-orphans --force-recreate
docker-compose logs --follow

Error Message:

10/09/20232:08:11 PM
[2023-10-09 19:08:11 +0000] [28] [ERROR] Exception in ASGI application
10/09/20232:08:11 PM
Traceback (most recent call last):
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi
10/09/20232:08:11 PM
    result = await app(  # type: ignore[func-returns-value]
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
10/09/20232:08:11 PM
    return await self.app(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 276, in __call__
10/09/20232:08:11 PM
    await super().__call__(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
10/09/20232:08:11 PM
    await self.middleware_stack(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
10/09/20232:08:11 PM
    raise exc
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
10/09/20232:08:11 PM
    await self.app(scope, receive, _send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
10/09/20232:08:11 PM
    raise exc
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
10/09/20232:08:11 PM
    await self.app(scope, receive, sender)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
10/09/20232:08:11 PM
    raise e
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
10/09/20232:08:11 PM
    await self.app(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
10/09/20232:08:11 PM
    await route.handle(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
10/09/20232:08:11 PM
    await self.app(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
10/09/20232:08:11 PM
    response = await func(request)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
10/09/20232:08:11 PM
    raw_response = await run_endpoint_function(
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
10/09/20232:08:11 PM
    return await run_in_threadpool(dependant.call, **values)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
10/09/20232:08:11 PM
    return await anyio.to_thread.run_sync(func, *args)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
10/09/20232:08:11 PM
    return await get_asynclib().run_sync_in_worker_thread(
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
10/09/20232:08:11 PM
    return await future
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
10/09/20232:08:11 PM
    result = context.run(func, *args)
10/09/20232:08:11 PM
  File "/app/app/webservice.py", line 89, in asr
10/09/20232:08:11 PM
    result = transcribe(
10/09/20232:08:11 PM
  File "/app/app/mbain_whisperx/core.py", line 62, in transcribe
10/09/20232:08:11 PM
    diarize_segments = diarize_model(audio, min_speakers, max_speakers)
10/09/20232:08:11 PM
NameError: name 'diarize_model' is not defined

@ayancey
Copy link
Collaborator

ayancey commented Oct 9, 2023

@AustinSaintAubin Did you provide the HF_TOKEN? That's required for diarization.

@ayancey
Copy link
Collaborator

ayancey commented Oct 9, 2023

testing:

  • I was able to build the image on M1 mac once I made @dahifi's changes, but I couldn't get it to run, maybe a CPU or RAM limitation. Running on M1 with GPU accel doesn't seem like something we can do at this time, see this discussion.
  • I tested on my Windows PC with Docker Desktop. GPU accel and WhisperX working nicely. I tried providing the HF token and tested diarization, but my WSL and whole computer crashed for some reason. Won't be able to try that again until after working hours.

additional thoughts:

  • Would be nice if we could get rid of the diarize param entirely when the HF token isn't provided. Right now it fails with a 500 if you don't provide the token. (as seen above)
  • ARM docker images would be awesome, but shouldn't block us from merging this PR
  • Should include gcc and python3-dev as suggested by @dahifi so ARM users can at least use it with CPU.
  • Do we have a standardized format for JSON output depending on which backend is used?

@dahifi
Copy link

dahifi commented Oct 10, 2023

Running on M1 with GPU accel doesn't seem like something we can do at this time

I haven't seen anything in the whisper community that can run M1 on anything other than CPU.

  • Should include gcc and python3-dev as suggested by @dahifi so ARM users can at least use it with CPU.

Again, the default engine runs fine with CUDA using the current docker image on my Win10 machine, although now I'm starting to question whether I pulled that in WSL or not. I've also been able to run https://github.com/MahmoudAshraf97/whisper-diarization in WSL and GPU support, but I remember I had some issues getting it going bc dependencies.

So I guess what I'm asking is whether this is a whisperx issue or something with my setup.

@ayancey
Copy link
Collaborator

ayancey commented Oct 10, 2023

Running on M1 with GPU accel doesn't seem like something we can do at this time

I haven't seen anything in the whisper community that can run M1 on anything other than CPU.

  • Should include gcc and python3-dev as suggested by @dahifi so ARM users can at least use it with CPU.

Again, the default engine runs fine with CUDA using the current docker image on my Win10 machine, although now I'm starting to question whether I pulled that in WSL or not. I've also been able to run https://github.com/MahmoudAshraf97/whisper-diarization in WSL and GPU support, but I remember I had some issues getting it going bc dependencies.

So I guess what I'm asking is whether this is a whisperx issue or something with my setup.

I am able to get WhisperX on CUDA working with WSL. Can you post which GPU drivers you have? It could be a weird issue with driver and CUDA incompatibility. I am running 537.13 on a Geforce RTX 2080 Ti.

@Deathproof76
Copy link

(thanks for notifying me @ayancey)

I built the docker gpu image. And I had some problems related to the HF_TOKEN, where it likely wouldn't get recognized from the docker-compose.yml. Or maybe there was a delay with the accepted user conditions. The container exited with:

whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [ERROR] Exception in worker process
whisperx-asr  | Traceback (most recent call last):
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
whisperx-asr  |     worker.init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
whisperx-asr  |     super(UvicornWorker, self).init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
whisperx-asr  |     self.load_wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
whisperx-asr  |     self.wsgi = self.app.wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
whisperx-asr  |     self.callable = self.load()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
whisperx-asr  |     return self.load_wsgiapp()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
whisperx-asr  |     return util.import_app(self.app_uri)
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
whisperx-asr  |     mod = importlib.import_module(module)
whisperx-asr  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
whisperx-asr  |     return _bootstrap._gcd_import(name[level:], package, level)
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
whisperx-asr  |   File "/app/app/webservice.py", line 18, in <module>
whisperx-asr  |     from .mbain_whisperx.core import transcribe, language_detection
whisperx-asr  |   File "/app/app/mbain_whisperx/core.py", line 18, in <module>
whisperx-asr  |     diarize_model = whisperx.DiarizationPipeline(use_auth_token=hf_token, device=device)
whisperx-asr  |   File "/app/whisperX/whisperx/diarize.py", line 19, in __init__
whisperx-asr  |     self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
whisperx-asr  | AttributeError: 'NoneType' object has no attribute 'to'
whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [INFO] Worker exiting (pid: 28)
whisperx-asr  | 
whisperx-asr  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisperx-asr  | It might be because the pipeline is private or gated so make
whisperx-asr  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisperx-asr  | create your access token and retry with:
whisperx-asr  | 
whisperx-asr  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisperx-asr  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
whisperx-asr  | 
whisperx-asr  | If this still does not work, it might be because the pipeline is gated:
whisperx-asr  | visit https://hf.co/pyannote/speaker-diarization-3.0 to accept the user conditions.
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Shutting down: Master
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Reason: Worker failed to boot.
   

the env of my docker-compose.yml:

environment:
      - ASR_MODEL=large-v2
      - HF_TOKEN="hf_jbseggsegssomethingJsgqBgeeeeeV"
      - ASR_ENGINE=whisperx

So I pasted it in ./app/mbain_whisperx/core.py and that got it to work. Will need to restest with the env again.

Is batched inferencing being used so far? The large model used almost 10GB Vram of my 3060 and it wasn't more perfomant/faster than the normal faster-whisper implementation. Will test more and update when I've the time 👍

@ayancey
Copy link
Collaborator

ayancey commented Oct 10, 2023

(thanks for notifying me @ayancey)

I built the docker gpu image. And I had some problems related to the HF_TOKEN, where it likely wouldn't get recognized from the docker-compose.yml. Or maybe there was a delay with the accepted user conditions. The container exited with:

whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [ERROR] Exception in worker process
whisperx-asr  | Traceback (most recent call last):
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
whisperx-asr  |     worker.init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
whisperx-asr  |     super(UvicornWorker, self).init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
whisperx-asr  |     self.load_wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
whisperx-asr  |     self.wsgi = self.app.wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
whisperx-asr  |     self.callable = self.load()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
whisperx-asr  |     return self.load_wsgiapp()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
whisperx-asr  |     return util.import_app(self.app_uri)
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
whisperx-asr  |     mod = importlib.import_module(module)
whisperx-asr  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
whisperx-asr  |     return _bootstrap._gcd_import(name[level:], package, level)
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
whisperx-asr  |   File "/app/app/webservice.py", line 18, in <module>
whisperx-asr  |     from .mbain_whisperx.core import transcribe, language_detection
whisperx-asr  |   File "/app/app/mbain_whisperx/core.py", line 18, in <module>
whisperx-asr  |     diarize_model = whisperx.DiarizationPipeline(use_auth_token=hf_token, device=device)
whisperx-asr  |   File "/app/whisperX/whisperx/diarize.py", line 19, in __init__
whisperx-asr  |     self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
whisperx-asr  | AttributeError: 'NoneType' object has no attribute 'to'
whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [INFO] Worker exiting (pid: 28)
whisperx-asr  | 
whisperx-asr  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisperx-asr  | It might be because the pipeline is private or gated so make
whisperx-asr  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisperx-asr  | create your access token and retry with:
whisperx-asr  | 
whisperx-asr  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisperx-asr  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
whisperx-asr  | 
whisperx-asr  | If this still does not work, it might be because the pipeline is gated:
whisperx-asr  | visit https://hf.co/pyannote/speaker-diarization-3.0 to accept the user conditions.
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Shutting down: Master
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Reason: Worker failed to boot.
   

the env of my docker-compose.yml:

environment:
      - ASR_MODEL=large-v2
      - HF_TOKEN="hf_jbseggsegssomethingJsgqBgeeeeeV"
      - ASR_ENGINE=whisperx

So I pasted it in ./app/mbain_whisperx/core.py and that got it to work. Will need to restest with the env again.

Is batched inferencing being used so far? The large model used almost 10GB Vram of my 3060 and it wasn't more perfomant/faster than the normal faster-whisper implementation. Will test more and update when I've the time 👍

To be honest, I don't know. I'll do some benchmarks comparing the speed of all three backends. I'm most excited for the increased accuracy of timestamps and diarization.

@dahifi
Copy link

dahifi commented Oct 11, 2023

@Deathproof76 I'm not sure if it's the same, but the Whisper reqs noted 3 HF gated models that I needed to clear. There was one additional. See the note in your error: 'pyannote/speaker-diarization-3.0'

@dahifi
Copy link

dahifi commented Oct 11, 2023

@ayancey I'm using 537.58.

I originally ran the readme's cmd where you clone the image from docker hub. That's the one that uses CUDA. I'm going to go back to square one and see if I can pull it from source and have it run the same. Right now I have this PR as a separate remote and I'm not comparing apples to apples.

@AustinSaintAubin
Copy link

@AustinSaintAubin Did you provide the HF_TOKEN? That's required for diarization.

I have testing again with environment variable set, and checked the dependent pipline /pyannote/speaker-diarization-3.0 is not gated to me... still not able to download the 'pyannote/speaker-diarization-3.0' pipeline. Not sure if HF_TOKEN is being passed or handled correctly.

    environment:
      - HF_TOKEN="hf_thehuggingfacetokenformyaccount"

https://huggingface.co/pyannote/speaker-diarization-3.0
Gated model: You have been granted access to this model

...
whisper-asr-webservice_x_gpu  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisper-asr-webservice_x_gpu  | It might be because the pipeline is private or gated so make
whisper-asr-webservice_x_gpu  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisper-asr-webservice_x_gpu  | create your access token and retry with:
whisper-asr-webservice_x_gpu  | 
whisper-asr-webservice_x_gpu  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisper-asr-webservice_x_gpu  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
...

@dahifi
Copy link

dahifi commented Oct 12, 2023

Next step for me will probably be looking at the whisperx repo directly and see if I can get that to work anywhere first.

@ayancey
Copy link
Collaborator

ayancey commented Oct 12, 2023

@AustinSaintAubin Did you provide the HF_TOKEN? That's required for diarization.

I have testing again with environment variable set, and checked the dependent pipline /pyannote/speaker-diarization-3.0 is not gated to me... still not able to download the 'pyannote/speaker-diarization-3.0' pipeline. Not sure if HF_TOKEN is being passed or handled correctly.

    environment:
      - HF_TOKEN="hf_thehuggingfacetokenformyaccount"

https://huggingface.co/pyannote/speaker-diarization-3.0 Gated model: You have been granted access to this model

...
whisper-asr-webservice_x_gpu  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisper-asr-webservice_x_gpu  | It might be because the pipeline is private or gated so make
whisper-asr-webservice_x_gpu  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisper-asr-webservice_x_gpu  | create your access token and retry with:
whisper-asr-webservice_x_gpu  | 
whisper-asr-webservice_x_gpu  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisper-asr-webservice_x_gpu  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
...

Try making a new token. This took a couple tries for me to get working on the original WhisperX repo. I don't think its related to this PR.

@AustinSaintAubin
Copy link

AustinSaintAubin commented Oct 14, 2023

@AustinSaintAubin You need to get access to both models: https://huggingface.co/pyannote/speaker-diarization https://huggingface.co/pyannote/segmentation

It looks like you got the access for one, but not the other.

https://huggingface.co/pyannote/speaker-diarization
Gated model: You have been granted access to this model

https://huggingface.co/pyannote/segmentation
Gated model: You have been granted access to this model

Sorry had not mentioned, had already vistied all three repos and accepted EULAs.

Fixed Docker caching issues
Updated Readme
@DennisTheD
Copy link
Author

I think pyannote released a newer segmentation model v3 (https://huggingface.co/pyannote/segmentation-3.0). After accepting the EULA it should work fine (at least for on with Windows+WSL).

@AustinSaintAubin
Copy link

I think pyannote released a newer segmentation model v3 (https://huggingface.co/pyannote/segmentation-3.0). After accepting the EULA it should work fine (at least for on with Windows+WSL).

That was it, at least for the CPU version (GPU version is still having the same issues as before); accepted EULA for segmentation-3.0) and now working as expected.


if torch.cuda.is_available():
device = "cuda"
model = whisper.load_model(model_name).cuda()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use whisperx model for transcription?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback! I fixed this issue, so whisperx should also be used for transcription now.

@EvilFreelancer
Copy link
Contributor

Hi! I would really like to see the WhisperX support in the project, is it possible to somehow speed up the code review procedure? Maybe you need some help?

@ahmetoner
Copy link
Owner

Dear @m-bain,

I have concerns regarding the WhisperX license. To avoid potential conflicts, would it be sufficient to address license requirements in a manner similar to other tools, as outlined in following features sections?

@dahifi
Copy link

dahifi commented Nov 17, 2023

I have concerns regarding the WhisperX license. To avoid potential conflicts, would it be sufficient to address license requirements in a manner similar to other tools, as outlined in following features sections?

Whisperx has a pretty fair license: https://github.com/m-bain/whisperX/blob/main/LICENSE

@dahifi
Copy link

dahifi commented Nov 17, 2023

So I can confirm that I was able to update the docker file by adding

      - ASR_ENGINE=whisperx
      - HF_TOKEN=

And I can confirm that it's offloading to my gpu. That said, I still can't confirm the output yet, TXT files are no good and I get KeyError: 'max_line_width' when selecting VTT. I'm trying another test with a smaller file, but basically transcription works but not diarize.

@AustinSaintAubin
Copy link

So I can confirm that I was able to update the docker file by adding

      - ASR_ENGINE=whisperx
      - HF_TOKEN=

And I can confirm that it's offloading to my gpu. That said, I still can't confirm the output yet, TXT files are no good and I get KeyError: 'max_line_width' when selecting VTT. I'm trying another test with a smaller file, but basically transcription works but not diarize.

I am having the same or similar issue, where diarization does not seem to work.

@m-bain
Copy link

m-bain commented Nov 18, 2023

@ahmetoner thats fine yeah, sorry some of the formats not fully tested with diarize

@dahifi
Copy link

dahifi commented Nov 18, 2023

@ahmetoner thats fine yeah, sorry some of the formats not fully tested with diarize

So which format actually works?

@DennisTheD
Copy link
Author

So which format actually works?
The JSON format works fine for me, including diarization. Other formats work as well, but do not include diarization information. The diarazitaion gets computed, but is not written to the output file when calling "write_result()". This could be fixed by implementing a custom result writer (WriteVTT / WriteTSV / WriteSRT / WriteTXT).

@dahifi
Copy link

dahifi commented Nov 22, 2023

So which format actually works?
The JSON format works fine for me, including diarization. Other formats work as well, but do not include diarization information. The diarazitaion gets computed, but is not written to the output file when calling "write_result()". This could be fixed by implementing a custom result writer (WriteVTT / WriteTSV / WriteSRT / WriteTXT).

I think that should be a requirement before this gets merged. (How are you using the JSON?) I use the ASR through Obsidian transcription plugin and it dumps the text directly, so I'll see what I can do with that and maybe figure a way to fix the others as well.

@andibakti
Copy link

Looks like the result writers for whisperX (WriteVTT / WriteSRT) already do that through the SubtitlesWriter class. So it should technically include the speakers already. I've haven't ran it myself to confirm yet.

 # add [$SPEAKER_ID]: to each subtitle if speaker is available
prefix = ""
if speaker is not None:
    prefix = f"[{speaker}]: "

See here: https://github.com/m-bain/whisperX/blob/4553e0d4edae3f9f49211de3a2e2bf0a9b265fe6/whisperx/utils.py#L291C17-L294C46

@bradfordben
Copy link

FYI, The commit 71a5281 (m-bain/whisperX@71a5281) to whisperX has broken this PR.

If you want to get it working then edit Dockerfile or Dockerfile.gpu and change the clone of whisperX to the following to use whisperX before the breaking change

RUN git clone https://github.com/m-bain/whisperX.git \
    && cd whisperX \
    && git checkout d97cdb7bcf302fb3e1651321a5935f90594e994c \
    && $POETRY_VENV/bin/pip install --no-dependencies -e .
    ```

@dahifi
Copy link

dahifi commented Jan 27, 2024

FYI, The commit 71a5281 (m-bain/whisperX@71a5281) to whisperX has broken this PR.

If you want to get it working then edit Dockerfile or Dockerfile.gpu and change the clone of whisperX to the following to use whisperX before the breaking change

RUN git clone https://github.com/m-bain/whisperX.git \
    && cd whisperX \
    && git checkout d97cdb7bcf302fb3e1651321a5935f90594e994c \
    && $POETRY_VENV/bin/pip install --no-dependencies -e .
    ```

I accidentally built and deployed the buggy image file to docker, so if you need to patch after deploy you'll need to make sure you use git fetch --unshallow on the whisperx repo and rebuild.

Does anyone know what the underlying issue is?

@dahifi
Copy link

dahifi commented Jan 27, 2024

I created a PR to update the dockerfiles in this PR. DennisTheD#2

@ahmetoner
Copy link
Owner

Hello @DennisTheD,
Could you please resolve the conflicts? Once that's done, I'll proceed with merging it into version 2.0 of our repository. Thank you!

@hlevring
Copy link

@DennisTheD any chance you can help to resolve the conflicts so we can get this merged. Whisperx support for this project would be awesome.

@dahifi
Copy link

dahifi commented Mar 22, 2024

@DennisTheD any chance you can help to resolve the conflicts so we can get this merged. Whisperx support for this project would be awesome.

If I had access to the project I'd resolve it myself. If Dennis doesn't show up to finish it then we'll have to fork the fork and send a PR back upstream.

Honestly though @hlevring you can run this fork just fine on it's own. I've had it deployed in my lab for months and even had it running in cloud at one point.

@DennisTheD
Copy link
Author

Hey ;) Sorry for late reply. I currently don't have access to my dev machine and can not make / test the required changes.
@dahifi: I invited you to get access to my fork. It's the first time I invited someone to a repo. Feel free to get in contact if I missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.