-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate whisper model with eval interface #2
base: cli/eval/interface
Are you sure you want to change the base?
Conversation
Taking a step back, here is the current state of the fairseq2 design:
So, if we stick to the above design, a simpler extension to enable Whisper eval would be:
|
|
||
log = get_log_writer(__name__) | ||
|
||
|
||
def _add_wav2vev2_asr_eval_cli(group: CliGroup) -> None: | ||
from fairseq2.recipes.eval.asr import load_wav2vec2_asr_evaluator | ||
from fairseq2.recipes.eval.asr import ASREvaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if changing the function "load_wav2vec2_asr_evaluator" to ASREvaluator is the best way. I'm not picky between having a function or a callable class, but the problem is that Whisper is an end-to-end model, while wav2vec2 is - as the name suggests - only an encoder that generates a vector. For wav2vec2, we need a text tokenizer and decoder, while for Whisper it is not required. So the "ASREvaluator" is still not abstract enough (at least your currently proposal, with self.tokenizer
and self.decoder
)
Basically I think we need just 2 functions to generate the HFEvaluator accordingly from its config (see my comments above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes lots of sense. I was just afraid of having two many functions just to support different models and thier requirements and that's why I created the class.
load_wav2vec2_asr_evaluator, | ||
preset_configs=hf_presets, | ||
ASREvaluator(), | ||
preset_configs=wav2vec2_presets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don'y need to have 2 registries for wav2vec2 and whisper
@@ -26,6 +30,21 @@ def _add_wav2vev2_asr_eval_cli(group: CliGroup) -> None: | |||
) | |||
|
|||
|
|||
def _add_whisper_asr_eval_cli(group: CliGroup) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is highly redundant I think. We can try parameterize the evaluator setup function, the presets can be customized at runtime.
model = load_wav2vec2_asr_model( | ||
config.model_name, device=init_device, dtype=config.dtype | ||
@whisper_presets.decorator("librispeech_asr") | ||
def _whisper_librispeech_asr_config() -> AsrEvalConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One default preset is enough I think. If the user wants to evaluate Whisper with librispeech_asr or other datasets, then they should specify it directly at runtime.
Otherwise we will have MxN presets for M models and N datasets :).
@antoine-tran thanks for the information. This makes things a lot clearer. The motivation behind the class was to encapsulate the helper functions that was created in the process. I will apply the suggestions and refactor the code again into functions. |
What does this PR do? Please describe:
Adds integration for whisper model with the eval interface.
Does your PR introduce any breaking changes? If yes, please list them:
None
Check list: