This code reflects the work described in the INTERSPEECH 2019 published paper on "An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities".
- Python (tested with v.2.7.5 & v.3.5.7).
- Kaldi ASR toolkit (for documentation checkout : http://kaldi-asr.org/) considering acoustic models trained with nnet2 (Dan's recipe) (tested with nnet2 & nnet3) on LibriSpeech.
Run the below code (prop_gop_eqn.py) to compute the score using the proposed GoP formulation by passing alignment_infile.txt and posterior_infile.ark generated for a given learner's utterance.
python prop_gop_eqn.py posterior_infile.ark alignment_infile.txt gop_outfile.txt
- The alignment_infile.txt file is the output of the forced-alignment of the learner's uttered speech (.wav file) and this is obtained using align.sh.
- The posterior_infile.ark file contains the frame level posterior-probabilities of the learner's uttered speech (.wav file) and this is obtained using nnet_am_compute.cc.
- The gop_outfile.txt file contains the score for each phoneme.
NOTE :
- The above python script requires a lookup table to generate the scores for an acoustic model as discussed in the paper, which can be generated using the following code :
./gen_lookup_table.sh
- Once the Goodness-of-Pronunciation-master.zip file is downloaded it needs to be placed in /home/user/kaldi/egs/Native_Acoustic_Model/s5/ and needs to unzipped as Extract Here which will result in the creation of the following path /home/user/kaldi/egs/Native_Acoustic_Model/s5/Goodness-of-Pronunciation-master/. The native acoustic model needs to be trained on nnet2 with all paths functional in exp folder.
- Once the path is created it will have the following file structure :
├── kaldi_folder
│ ├── native_acoustic_model
│ │ ├── s5
│ │ │ ├── Goodness-of-Pronunciation-master
│ │ │ │ ├── extract_from_alignments.sh
│ │ │ │ ├── gen_lookup_table.sh
│ │ │ │ ├── modify_post.sh
│ │ │ │ ├── extract_from_alignments.sh
│ │ │ │ ├── gop_outfile.txt
│ │ │ │ ├── prop_gop_eqn.py
│ │ │ │ ├── reqd_files
│ │ │ │ │ ├── alignment_infile.txt
│ │ │ │ │ ├── posterior.txt
│ │ │ │ │ ├── posterior_infile.ark
│ │ │ │ │ ├── show_transitions.txt
│ │ │ │ │ ├── lookup_table.txt
│ │ │ │ │ ├── tmp_t_ids.txt
│ │ │ │ │ ├── tmp_phones.txt
│ │ │ │ │ ├── tmp_segments.txt
If you find our work useful, please cite:
@inproceedings{Sudhakara2019,
author={Sweekar Sudhakara and Manoj Kumar Ramanathi and Chiranjeevi Yarra and Prasanta Kumar Ghosh},
title={{An Improved Goodness of Pronunciation (GoP) Measure for Pronunciation Evaluation with DNN-HMM System Considering HMM Transition Probabilities}},
year=2019,
booktitle={Proc. Interspeech 2019},
pages={954--958},
doi={10.21437/Interspeech.2019-2363},
url={http://dx.doi.org/10.21437/Interspeech.2019-2363}
}