Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 1.84 KB

README.md

File metadata and controls

25 lines (21 loc) · 1.84 KB

Keyword Based ASR

Repo for implementation of keyword-based ASR system

Code style: black

Setup

  • create virtual enviroment and install requirements from requirements.txt
  • NOTE: NeMo toolkit is not supported on Windows, so WSL or UNIX-based OS is required, see the docs or github
  • minimal, necessary data is already in the repo, but to reproduce training process and / or test other keywords you need to download full datasets from this link (if the link doesn't work, please contact me via email: [email protected]) and put them in Data directory (check Data README for the structure)
  • modify config file to match your setup (all paths with suffix DATA_DIR should be changed to match your setup)

Repository structure

  • Data - directory with data
  • Utils - directory containing utility scripts and config files
  • Models - directory containing trained models' weights
  • notebooks in root directory
    • main
      • demo.ipynb - notebook demonstrating dual-model keyword-based speaker recognition system
      • speaker-recognition.ipynb - notebook with speaker recognition model training and evaluation
      • keyword-recognition.ipynb - notebook with keyword spotting model demonstration and evaluation
    • suplementary
      • get-data.ipynb - check audio files metadata
      • visualize-spectrograms.ipynb - visualize spectrograms of audio files
      • play-sound.ipynb - sanity-check audio files