From c78e79412763f64d324ea4ebac464f92239149a6 Mon Sep 17 00:00:00 2001 From: su Date: Wed, 3 Apr 2024 15:11:18 -0400 Subject: [PATCH] update readme --- README.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index fec289e..dcadb04 100644 --- a/README.md +++ b/README.md @@ -34,19 +34,19 @@ * > faster-whisper is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models. > > This implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. -* [x] [m-bain/whisperX](https://github.com/m-bain/whisperX) - * >fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. - >- ⚡ī¸ Batched inference for 70x realtime transcription using whisper large-v2 - >- đŸĒļ [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires <8GB gpu memory for large-v2 with beam_size=5 - >- đŸŽ¯ Accurate word-level timestamps using wav2vec2 alignment - >- đŸ‘¯â€â™‚ī¸ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (speaker ID labels) - >- đŸ—Ŗī¸ VAD preprocessing, reduces hallucination & batching with no WER degradation. -* [x] [jianfch/stable-ts](https://github.com/jianfch/stable-ts) - * >**Stabilizing Timestamps for Whisper**: This library modifies [Whisper](https://github.com/openai/whisper) to produce more reliable timestamps and extends its functionality. -* [x] [Hugging Face Transformers](https://huggingface.co/tasks/automatic-speech-recognition) - * > Hugging Face implementation of Whisper. Any speech recognition pretrained model from the Hugging Face hub can be used as well. -* [x] [API/openai/whisper](https://platform.openai.com/docs/guides/speech-to-text) - * > OpenAI Whisper via their API + * [x] [m-bain/whisperX](https://github.com/m-bain/whisperX) + * >fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. + > - ⚡ī¸ Batched inference for 70x realtime transcription using whisper large-v2 + > - đŸĒļ [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires <8GB gpu memory for large-v2 with beam_size=5 + > - đŸŽ¯ Accurate word-level timestamps using wav2vec2 alignment + > - đŸ‘¯â€â™‚ī¸ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (speaker ID labels) + > - đŸ—Ŗī¸ VAD preprocessing, reduces hallucination & batching with no WER degradation. + * [x] [jianfch/stable-ts](https://github.com/jianfch/stable-ts) + * >**Stabilizing Timestamps for Whisper**: This library modifies [Whisper](https://github.com/openai/whisper) to produce more reliable timestamps and extends its functionality. + * [x] [Hugging Face Transformers](https://huggingface.co/tasks/automatic-speech-recognition) + * > Hugging Face implementation of Whisper. Any speech recognition pretrained model from the Hugging Face hub can be used as well. + * [x] [API/openai/whisper](https://platform.openai.com/docs/guides/speech-to-text) + * > OpenAI Whisper via their API * Web UI * Fully offline, no third party services