Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout or lockup after "rhasspy3.vad:segment: speaking ended" with longer reply #55

Closed
phormix opened this issue Nov 13, 2023 · 2 comments · May be fixed by #30
Closed

Timeout or lockup after "rhasspy3.vad:segment: speaking ended" with longer reply #55

phormix opened this issue Nov 13, 2023 · 2 comments · May be fixed by #30

Comments

@phormix
Copy link

phormix commented Nov 13, 2023

For reference I'm using the following pipeline (though the halting point seems to be something with or just after ASR processing)

mic:
  name: arecord
vad:
  name: silero
asr:
  name: faster-whisper.client
wake:
  name: porcupine1
handle:
  name: home_assistant
tts:
  name: piper.client
snd:
  name: aplay

In this pipeline, if I use tiny-int8 with whisper (running as a server), it will quickly return a response and the pipeline will continue after VAD but with relatively poor recognition

If I use any other model, the whisper portion takes slightly longer and the whole pipeline sticks at "Speaking Ended", i.e. per the debug

DEBUG:rhasspy3.core:Loading config from /home/rhasspy/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Loading config from /home/rhasspy/rhasspy3/config/configuration.yaml
DEBUG:rhasspy3.program:mic_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', 'arecord -q -r 16000 -D plughw:CARD=Device -c 1 -f S16_LE -t raw -']
DEBUG:rhasspy3.program:client_unix_socket.py ['var/run/faster-whisper.socket']
DEBUG:rhasspy3.program:.venv/bin/python3 ['bin/porcupine_stream.py', '--model', 'jarvis_raspberry-pi.ppn']
DEBUG:rhasspy3.wake:detect: processing audio
DEBUG:rhasspy3.wake:detect: Detection(name='jarvis_raspberry-pi', timestamp=8540921121325)
DEBUG:rhasspy3.program:vad_adapter_raw.py ['--rate', '16000', '--width', '2', '--channels', '1', '--samples-per-chunk', '512', 'script/speech_prob "share/silero_vad.onnx"']
DEBUG:rhasspy3.vad:segment: processing audio
DEBUG:rhasspy3.vad:segment: speaking started
DEBUG:rhasspy3.vad:segment: speaking ended

The best I can tell, the slightly additional delay in response from the whisper server is causing something to lock up after VAD and never pass things on to the next part of the pipeline. I suspect there is some sort of timeout with the connection to the whisper server which causes it to never get the response and thus not move one. It doesn't seejm to matter which VAD I actually use (both silero and webrtcvad similarly get stuck) so it's probably in the processing between overall VAD and STT functionality.
From what I can see in "top" I'm not running into some sort of memory-limit that causes the crash, as while there is a bit of a CPU spike during STT I've still got mem free

Hardware: Raspberry Pi 4, 2GB

@Shulyaka
Copy link

Looks like a duplicate of #29.
Please try the fix in #30.

@phormix
Copy link
Author

phormix commented Nov 14, 2023

Applying the updated seems to have fixed this, thanks!

@phormix phormix closed this as completed Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants