A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

Note

Code for the paper 'A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning', presented at Interspeech 2023
The link to the paper: https://www.isca-speech.org/archive/interspeech_2023/ryu23_interspeech.html
The paper and the code was extended to my Master's thesis (access will be available from October!), where MTL with RMSE+CTC and GoP features brought better results for APA task
Regarding the license, please refer to the LICENSE.md
If there are any problems, feel free to email me ([email protected]) or post an issue.

Citation

If you find this repository useful, please cite our paper

@inproceedings{ryu23_interspeech,
  author={Hyungshin Ryu and Sunhee Kim and Minhwa Chung},
  title={{A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={959--963},
  doi={10.21437/Interspeech.2023-337}
}

Code

python version 3.8 (3.8.0, 3.8.16) was used for training and testing
For all steps, don't forget to change the path to your own directories!
For MDD evaluation, Kaldi needs to be installed beforehand to utilize SCTK toolkit. (https://kaldi-asr.org/doc/install.html)

Prepare the data

# first, make a csv list of 3 datasets (TIMIT, L2-arctic, Speechocean762)
cd data
python create_datasets.py
# then, make the data into huggingface datasets format, and create vocabulary set
python preprocess_datasets.py

Do an auxiliary fine-tuning on the SSL model for phone recognition task, to leverage extra knowledge transfer

cd auxiliary-phone-recognition
# fine-tune the baseline Wav2Vec2-large-robust for phone recognition with TIMIT train split (or other datasets!)
python trainer_train.py --exp_prefix TIM_robust
# or fine-tune HuBERT-large for phone recognition with TIMIT train split...
python trainer_train.py --model_name_or_path facebook/hubert-large-ll60k --exp_prefix TIM_hubert
# test the fine-tuned phone recognition model on TIMIT test split (or other datasets!)
python trainer_test.py TIM_robust_lr0.0001_warm0.1_type-linear >> trainer_test.log

Train a joint model of APA and MDD with multi-task learning

cd multi-task-learning
# jointly train APA and MDD, with a model fine-tuned for phone recognition (L1)
python train.py --exp_prefix Joint-CAPT-L1 --model_name_or_path /PATH_TO_YOUR_MODEL/trainer/TIM_robust_lr0.0001_warm0.1_type-linear
# or jointly train APA and MDD, with a raw SSL model (SSL)
python train.py --exp_prefix Joint-CAPT-SSL --model_name_or_path facebook/wav2vec2-large-robust --no_phone_recognition --enable_cls_epochs 50
# or train only on APA, with a raw SSL model (SSL)
python train.py --exp_prefix APA-SSL --model_name_or_path facebook/wav2vec2-large-robust --no_phone_recognition --cls_weight 1.0 --ctc_weight 0.0

test the model and do correlation analysis

cd ./test
sh test.sh [YOUR MTL/STL MODEL NAME]
python correlation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

Note

Citation

Code

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
auxiliary-phone-recognition		auxiliary-phone-recognition
data		data
multi-task-learning		multi-task-learning
vocab		vocab
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

License

rhss10/joint-apa-mdd-mtl

Folders and files

Latest commit

History

Repository files navigation

A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

Note

Citation

Code

About

Topics

Resources

License

Stars

Watchers

Forks

Languages