Whisper-subtitle

Whisper-Subtitle is a Python-based tool that automates the process of extracting audio from video files, optionally denoising the audio, and generating subtitle files in SRT format using OpenAI's Whisper model. This repository leverages libraries such as PyTorch, Torchaudio, Transformers, and Denoiser to deliver high-quality transcription and subtitle generation.

Features

Audio Extraction: Extracts audio from video files using ffmpeg.
Noise Reduction: Optionally reduces background noise in audio using a pretrained denoiser model.
Transcription: Utilizes OpenAI's Whisper model for accurate speech-to-text transcription.
Subtitle Generation: Creates .srt subtitle files with timestamped transcripts.

Prerequisites

Python 3.8 or higher
FFmpeg installed and added to your system's PATH.
NVIDIA or AMD GPU with CUDA or ROCm support (optional, for faster processing)

Notes

A minimum of 16GB VRAM is recommended to achieve high-quality subtitle materials.
Whisper has issues with generating precise timestamps. As a result, accurate timestamps may not always be produced, and a review is necessary.
In certain languages, the large-v2 model outperforms the large-v3 model. This is influenced by the pseudo-labeled training method. For example, tests have confirmed that the large-v2 model delivers stronger performance for Japanese.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper-subtitle

Features

Prerequisites

Notes

About

Releases

Packages

Languages

Cyp9715/Whisper-subtitle

Folders and files

Latest commit

History

Repository files navigation

Whisper-subtitle

Features

Prerequisites

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages