Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt aligner from "Huang et al., Less Peaky and More Accurate CTC Forced Alignment by Label Priors" #3826

Open
dmitry-mli opened this issue Aug 21, 2024 · 4 comments

Comments

@dmitry-mli
Copy link

dmitry-mli commented Aug 21, 2024

🚀 The feature

Consider on-boarding aligner from Huang et al., Less Peaky and More Accurate CTC Forced Alignment by Label Priors (@huangruizhe) to the existing set of aligners given it improves alignment accuracy compared to the existing Wav2Vec2 CTC aligner by up to 60% P50 on English.

Motivation, pitch

Today, torch audio offers Forced Alignment through a simple extendable interface. The recently published aligner Huang et al., Less Peaky and More Accurate CTC Forced Alignment by Label Priors (github) drives the word boundary error (WBE) down (better) compared to Wav2Vec2. We (@dmitry-mli @jamesr66a @websterbei) explored the model and had WBE for our English samples decrease by up to 60% for P50, 45% for P70 and 15% for P95 compared to Wav2Vec2 CTC alignment.

image

Alternatives

This request is related to a particular research.

Additional context

Thanks for consideration. @huangruizhe @jamesr66a @websterbei

@dmitry-mli dmitry-mli changed the title On-board aligner from "Huang et al., Less Peaky and More Accurate CTC Forced Alignment by Label Priors" Adopt aligner from "Huang et al., Less Peaky and More Accurate CTC Forced Alignment by Label Priors" Aug 21, 2024
@huangruizhe
Copy link
Contributor

Thanks for your interests in our work and sharing the nice results! As we have been switching between projects, things have been greatly delayed. Regarding the plan, I will be more available in late September and October. I will work on it at that time!

@dmitry-mli
Copy link
Author

Looking forward to it, thank you!

@christincha
Copy link

A catch up here, if there is any updated plan for incorporating the Huang et al., Less Peaky and More Accurate CTC Forced Alignment by Label Prior to current Pytorch audio aligner!

@huangruizhe
Copy link
Contributor

Hi @christincha, I am still working on it. Before making it official, if you hope to do any experiments, maybe check this out: https://colab.research.google.com/drive/1xciHB1Twi7VFutACrv94-Ejff1-VrjzL?usp=sharing
There are different implementations of the proposed CTC loss, the training recipe, visualization tools as well as a pretrained model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants