Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add initial GPU support #4

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Conversation

edurenye
Copy link

This is a work in progress.
I think for whisper it is working, but I'm not sure how to check it.
And for piper it is giving me an error unrecognized arguments: --cuda, but I got the instructions from here: https://github.com/rhasspy/piper At the end it says that it should work just installing onnxruntime-gpu and running piper with the --cuda argument.

What am I missing?

I guess this will conflict with those that just want to use the CPU, how can we handle that? Making different images?
Ex: piper and piper-gpu

@edurenye
Copy link
Author

edurenye commented Aug 24, 2023

Closes #3

@DBaker85
Copy link

DBaker85 commented Sep 12, 2023

Just wanted to leave my 2cents here:
I tried your whisper changes locally and it is working perfectly on my 1080ti and Docker.
VRam is assigned and the container works as well. Home assistant also recognised and used it perfectly.
Nice one!

(Did not try Piper)

@edurenye
Copy link
Author

Piper does not work because of this: rhasspy/rhasspy3#49

@wdunn001
Copy link

wdunn001 commented Oct 5, 2023

Whisper is still targeting 20.04 is there a reason for that?

@wdunn001
Copy link

wdunn001 commented Oct 5, 2023

This may need to be its own image since the majority of users would not want the cuda version

@wdunn001
Copy link

wdunn001 commented Oct 5, 2023

could this be split into 2 tickets one for whisper and one for piper. The whisper portion is in reality the more useful of the two and benefits more from this feature. If piper is experiencing issues.

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

@wdunn001 From the documentation https://github.com/guillaumekln/faster-whisper/ it says it requires cuDNN 8 for CUDA 11, and for those versions of CUDA and cuDNN the highest version of ubuntu available is 20.04, and I had to look for it because it was not working with the image I set for the other containers sadly.
And updating to CUDA 12 is not planned in the very short term. See an explanation here: SYSTRAN/faster-whisper#47 (comment).

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

Sorry, editing because I missunderstood your comment.
Yes, makes sense to make it 2 different images, I can add that.

But I guess for better maintainability the solution we add for one should be the same as for the others, for that is I think is better to have the conversation in a single issue and PR.
If you need to use it right now you can just add the changes to your local Dockerfile and build it.
Or if you need to use CUDA 12 you could try the workarounds that they comment in here: SYSTRAN/faster-whisper#153 (comment)

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

And I'll try to add porcupine1 too

@wdunn001
Copy link

wdunn001 commented Oct 6, 2023

Awesome! I am happy to help if you need anything. Would we want to add the docker arguments for the CUDA image to the documentation here?

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

I added the changes.
I have not tested the new porcupine1 container, since that software does not support my language yet.

And yes, ofc we should document this, also I was thinking should we add a docker-compose.yml file?
It made sense for me since I use home assistant and need the 3 services. But now that porcupine1 has been added I am not sure anymore since as far as I know porcupine1 and openwakeword do the same, which is quite confusing for me.

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

But in the README.md file right now there is just the documentation for using it pulling the images, not building them, so that will depend on the tags the maintainer might wanna use. Should we add building instructions to the README.md file?

@wdunn001
Copy link

wdunn001 commented Oct 6, 2023

I think so for sure we can create a contributors section. I'll work on it I will be building it for the first time this weekend so I'll try and document the process.

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

I will give you the docker-compose files and a starting point.

@edurenye
Copy link
Author

edurenye commented Oct 6, 2023

I just added it, tell me how it works for you, you can create your own docker-compose.x.yml file for your use case.

I have not added porcupine1 to the docker compose because it uses the same port as openwakeword, so for that particular case it could be added in the custom extend file.

@wdunn001
Copy link

wdunn001 commented Oct 8, 2023

ok so I am getting an error deploying this via compose or run

usage: main.py [-h] --model {tiny,tiny-int8,base,base-int8,small,small-int8,medium,medium-int8} --uri URI --data-dir DATA_DIR [--download-dir DOWNLOAD_DIR] [--device DEVICE] [--language LANGUAGE] [--compute-type COMPUTE_TYPE] [--beam-size BEAM_SIZE] [--debug]
main.py: error: the following arguments are required: --model, --uri, --data-dir
/run.sh: line 3: --uri: command not found
/run.sh: line 4: --data-dir: command not found
/run.sh: line 5: --download-dir: command not found

It needs additional params in contrast with the other build.

These appear to be supplied by the run.sh file and I see its called in the Dockerfile.

I added commands to the GPU compose file identical to those in the NOGPU version and they work fine and made a pr. Its only the ones in the run.sh that seem to not work.

I am on Ubuntu 22.04 with latest docker is that matters.

@edurenye
Copy link
Author

edurenye commented Oct 9, 2023

This is weird, according to the documentation, the only thinks not extended should be volumes_from and depends_on. We can follow this discussion in the PR that you created edurenye#1

@AnkushMalaker
Copy link

I needed to add --device cuda to actually load the whisper model onto my GPU. I second that we could split this into different branches to handle GPU for whisper, piper and wakeword. I made a branch for that, not sure if I should raise this as a PR.

  • removed --cuda for piper as that isn't working upstream yet.
  • changed the default data directories to /var/data to be consistent with some other docker compose files I saw.

New to contributing, happy to hear thoughts.

https://github.com/AnkushMalaker/wyoming-addons/tree/gpu

@edurenye
Copy link
Author

I rebased with the last chnages from master and the typos in the readme file.

I don´t think we need to create another branch for the meanwhile you can just have an extend file where you use GPU options for whisper and openwakeword and nongpu for piper.

And regarding /var/data, I am generally against storing user data in a system folder. And passing all the folder to the docker container might load a lot of data that is not needed from other applications.

@wdunn001
Copy link

@edurenye agreed using cpu for piper seems to be more than sufficient. I am still experiencing issues with openwakeword but it may just be my environment. I'll pull down the changes here and try again. I'll push any fixes I find to the PR on your branch.

@@ -0,0 +1,35 @@
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we remove this file in the interim to get rid of dead code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see it as dead code, when this issue gets fixed it should just work right away.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good

@@ -0,0 +1,32 @@
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
Copy link

@wdunn001 wdunn001 Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove to get rid of deadcode?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see it as dead code either, the people that wants to use it can just use it extending the docker compose or use it directly with docker run as documented here: https://github.com/rhasspy/wyoming-porcupine1/blob/master/README.md but adding the cuda stuff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

.gitignore Outdated
@@ -0,0 +1,12 @@
# OpenWakeWord

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we reference managed volumes instead to prevent this?

i.e.
volumes:
openwakeword-data:
whisper-data:
piper-data:

this is what I did in my version.
we could also add -gpu for volumes connected to gpu enabled instances in the GPU compose file so that we can keep data seperate between instance types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean non binded mounts? But then adding custom models (thinking mainly about OpenWakeWord here) is hard, with binded mounts you can just move the model to that directory. Also I don't think there will be a case where you want to move from GPU to NONGPU changing models, but probably I am wrong there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you here, probably the best way is to not bind them by default and then you can bind them extending the docker compose and point wherever you have the custom model.

Or maybe we could look at passing it as a parameter, haven't looked into it, I'm still fighting to generate the custom model actually.

docker compose down
```

### Run with GPU

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reference documentation on how to setup docker for gpu? (I can of course add it in a seperate pr)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea!

@Maxcodesthings
Copy link

Maxcodesthings commented Oct 25, 2023

I have tried applying the contents of this PR to my local instance. I do not see the faster-whisper implementation use GPU over CPU.

I have conflated the dockerfiles as such and focused on only using GPU for whisper container:

  whisper:
    container_name: whisper
    build:
      context: /opt/wyoming-addons/whisper/
      dockerfile: GPU.Dockerfile
    # image: rhasspy/wyoming-whisper:latest
    restart: unless-stopped
    ports:
      - 10300:10300
    volumes:
      - /opt/homeassistant/whisper:/data
    command: 
      - --model
      - medium-int8
      - --language
      - en
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

I can tell my GPU is passed through because it appears in nvidia-smi on the container
NVIDIA_Share_gj0fFevK7R

However when watching GPU when processing my speech the usage does not increase, and when watching CPU the usage clearly spikes since it's the CPU processing my speech

How have you all tested that this implementation of faster-whisper is working? I would like to do the same on my machine

Edit:

Found the issue!

You are missing --device in your compose

command: 
      - --model
      - small
      - --language
      - en
      - --device
      - cuda

@edurenye
Copy link
Author

Good finiding! Was not documented, but that parameter exists in https://github.com/rhasspy/wyoming-faster-whisper/blob/master/wyoming_faster_whisper/__main__.py

@mreilaender
Copy link

Can u resolve the conflicts? I would love to see the improvements from using the GPU directly :)

@mreilaender
Copy link

Doesn't work with piper since wyoming-piper doesn't declare the --cuda argument. I created a PR

@edurenye
Copy link
Author

This was a big facepalm, since I was using the '-d' option and not really using piper or any of the others until now that I received the M5 ATOM ECHO, I didn't see it fail.

So I fixed the issue with the entrypoint not accepting env variables directly.

Then I realized CUDA was not working, I updated the python files because they were not matching the version 1.5.0 anymore.
Still not working then I used @synesthesiam recommendation from rhasspy/wyoming#9 (comment), that did not work since wouldn't install piper-phonemize, but found a workaround here rhasspy/piper-phonemize#10 (comment).

CUDA was finally working, but then the non GPU was not working, so I ended up splitting the two files again, now both versions work, finally!

So, please try again @spitfire 🤞

@spitfire
Copy link

Doesn't crash anymore, but doesn't work either.

I've replaced the default voice in base config, but when asked to do TTS it fails like this:

wyoming-openwakeword-1  | INFO:root:Ready
wyoming-piper-1         | INFO:wyoming_piper.download:Downloaded /data/pl_PL-darkman-medium.onnx (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/pl/pl_PL/darkman/medium/pl_PL-darkman-medium.onnx)
wyoming-piper-1         | INFO:wyoming_piper.download:Downloaded /data/pl_PL-darkman-medium.onnx.json (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/pl/pl_PL/darkman/medium/pl_PL-darkman-medium.onnx.json)
wyoming-piper-1         | INFO:__main__:Ready
wyoming-whisper-1       | INFO:__main__:Ready
wyoming-whisper-1       | INFO:faster_whisper:Processing audio with duration 00:11.720
wyoming-whisper-1       | INFO:wyoming_faster_whisper.handler: Dziękuję.
wyoming-piper-1         | ERROR:asyncio:Task exception was never retrieved
wyoming-piper-1         | future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.8/dist-packages/wyoming/server.py:31> exception=FileNotFoundError(2, 'No such file or directory')>
wyoming-piper-1         | Traceback (most recent call last):
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming/server.py", line 41, in run
wyoming-piper-1         |     if not (await self.handle_event(event)):
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming_piper/handler.py", line 53, in handle_event
wyoming-piper-1         |     raise err
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming_piper/handler.py", line 48, in handle_event
wyoming-piper-1         |     return await self._handle_event(event)
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming_piper/handler.py", line 108, in _handle_event
wyoming-piper-1         |     wav_file: wave.Wave_read = wave.open(output_path, "rb")
wyoming-piper-1         |   File "/usr/lib/python3.8/wave.py", line 510, in open
wyoming-piper-1         |     return Wave_read(f)
wyoming-piper-1         |   File "/usr/lib/python3.8/wave.py", line 160, in __init__
wyoming-piper-1         |     f = builtins.open(f, 'rb')
wyoming-piper-1         | FileNotFoundError: [Errno 2] No such file or directory: ''
wyoming-whisper-1       | INFO:faster_whisper:Processing audio with duration 00:02.330
wyoming-whisper-1       | INFO:wyoming_faster_whisper.handler: Wyłącz lampki.
wyoming-piper-1         | ERROR:asyncio:Task exception was never retrieved
wyoming-piper-1         | future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.8/dist-packages/wyoming/server.py:31> exception=FileNotFoundError(2, 'No such file or directory')>
wyoming-piper-1         | Traceback (most recent call last):
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming/server.py", line 41, in run
wyoming-piper-1         |     if not (await self.handle_event(event)):
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming_piper/handler.py", line 53, in handle_event
wyoming-piper-1         |     raise err
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming_piper/handler.py", line 48, in handle_event
wyoming-piper-1         |     return await self._handle_event(event)
wyoming-piper-1         |   File "/usr/local/lib/python3.8/dist-packages/wyoming_piper/handler.py", line 108, in _handle_event
wyoming-piper-1         |     wav_file: wave.Wave_read = wave.open(output_path, "rb")
wyoming-piper-1         |   File "/usr/lib/python3.8/wave.py", line 510, in open
wyoming-piper-1         |     return Wave_read(f)
wyoming-piper-1         |   File "/usr/lib/python3.8/wave.py", line 160, in __init__
wyoming-piper-1         |     f = builtins.open(f, 'rb')
wyoming-piper-1         | FileNotFoundError: [Errno 2] No such file or directory: ''
wyoming-whisper-1       | INFO:faster_whisper:Processing audio with duration 00:02.260
wyoming-whisper-1       | INFO:wyoming_faster_whisper.handler: Wyłącz lampki.

@edurenye
Copy link
Author

Probably that is why my M5 ATOM ECHO follows the order but gets stuck before answering 😞

Well, I do not understand this error, I'll need help with it. This error seems to come from wyoming_piper, could you give me a hint @synesthesiam please?

@spitfire
Copy link

spitfire commented Mar 14, 2024

Probably that is why my M5 ATOM ECHO follows the order but gets stuck before answering 😞

Well, I do not understand this error, I'll need help with it. This error seems to come from wyoming_piper, could you give me a hint @synesthesiam please?

My echos had problems with responding even if that was not the case. You could set up the regular piper add-on on HA, change the pipeline to use it and see if that resolves the issue.

I've modified my docker-compose.base.yml to use command: [ "--voice", "pl_PL-darkman-medium" ] if that helps. It's the same voice I've been using with the official add-on, and as you can see it is being downloaded.

@edurenye
Copy link
Author

What I would recommend you to do is to create a docker-compose.pl.yml, there you can select which services to use from which file, so you could use whisper from the docker-compose.gpu.yml and piper from docker-compose.base.yml for example and there you can also overwrite the command.

Next thing I'll try for my echo is to use the base service, if that does not work I'll use it in HA, but I don't want to add this kind of things to HA since that would make the system slow or overheat, it's a RPi4.

Also, I'll try to upgrade the Ubuntu image, see if that fixes the issue.

@edurenye
Copy link
Author

That was the issue for my ECHO, upgrading did not fix it, but does not add more problems, so I think once we fix this issue we can upgrade.

At the end I used the base service and I have my ECHO working, but the responses sound awful, probably would sound better with a better model in GPU.

Also, ECHO is not the best thing sometimes the speaker makes this sounds like a wire is not fully connected, like electrical sparks. Furthermore, after a while it stops to listen, and I need to plug it off and on again.

I think I need a better speaker, even if it's a bit more expensive.

I would like to pass the responses to a better Bluetooth speaker directly from Home Assistant, but that will be hard and require time, right now Home Assistant doesn't even allow me to use Bluetooth speakers for playing music...

@nikito
Copy link

nikito commented Mar 19, 2024

Just wanted to chime in and say I got this running in k8s, and everything works fine (including openwake with a custom wake word :) ), except for Piper, I get the same error as the person above where it says no such file or directory: ''. I did some debug and it seems that there's something going on with stdin/stdout that is supposed to take the input text, generate the wave file, then pass it along. Not sure where that is breaking down, but that appears to be the root issue. :)

@timkolloch
Copy link

I checked out your branch @edurenye and ran the dockerc-compose.gpu.yml. IT started perfectly with the tiny model which is preconfigured. I wanted to run the large-v3 model and so I changed it in both docker compose files (base and gpu) but the container never started and was styling in "Attaching to wyoming-whisper-1" is there anything I did wrong?

@edurenye
Copy link
Author

@nikito I been quite busy lately, I'll try to take another look at Piper this summer, but they need to add the support for GPU in the other tickets that are linked in this ticket.

@tiko2302 Not sure about the error, but you should not change the base file or the gpu file, instead you should overwrite the model in a new file where you can extend from both files, for example I have this file for Catalan language docker-compose.ca.yml:

services:
  wyoming-piper:
    extends:
      file: docker-compose.base.yml
      service: wyoming-piper
    command: [ "--voice", "ca_ES-upc_ona-medium" ]

  wyoming-whisper:
    extends:
      file: docker-compose.gpu.yml
      service: wyoming-whisper
    command: [ "--model", "medium-int8", "--language", "ca", "--device", "cuda" ]

  wyoming-snowboy:
    extends:
      file: docker-compose.gpu.yml
      service: wyoming-snowboy
    volumes:
      - /home/eduard/snowboy-data:/custom-data
    command: [ "--custom-model-dir", "/custom-data" ]

As you can see I use the base image for Piper, and GPU for the others.

@nikito
Copy link

nikito commented Sep 18, 2024

Checking in, any updates on this PR? :)

@edurenye
Copy link
Author

Sorry @nikito I haven't seen any progress on piper, I'm fighting with local LLMs now, but whisper is still working for me, I moved from external Snowboy to on device micro_wake_word on Atom Echos, but it should still be working fine. This repo did not have any updates neither.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.