Integrate whisper model with eval interface #2

Ahmedsaed · 2024-08-01T15:13:24Z

What does this PR do? Please describe:
Adds integration for whisper model with the eval interface.

Does your PR introduce any breaking changes? If yes, please list them:
None

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

antoine-tran · 2024-08-02T14:12:42Z

Taking a step back, here is the current state of the fairseq2 design:

In fairseq2, we have "Evaluator" which encapsulates all routines needed to evaluate a model: The model itself, the dataset, the metrics, I/O (where to record metrics), seeds, .... . The evaluator will basically iterate over dataset reader, and for each batch, spin off one or multiple "EvalUnit" to run different evaluation and report different metrics.
Ideally, we just have to define a "HFEvalUnit" that evaluate a model using HuggingFace's evaluate metrics. This way, we have just one final Evaluator (think of it as a pipeline), and several units , one for external models like whisper , one for fairseq2 models like wav2vec2, ....
However, currently, the design of fairseq2 ties to torcheval and declare that every metrics is a form of the Metricbag. I tried to resolve this wih a wrapper, but we do not reach the consensus yet.
Therefore, for the time being, we have 2 overlapping versions of the Evaluator, the Evaluator and HFEvaluator, one for the torcheval metrics and one for HuggingFace metrics.
The concept presets simply means a registry with the pre-defined configuration for dataset, model, ... The value in the presets can be overriden in runtime (i.e. via CLI) with the --config KEY VALUE argument.
Right now we have a preset "hf_presets" which means: A registry of different eval units, all will be evaluated over the HF data reader. The "hf_presets" has one decorator "librispeech_asr" which means: An eval unit for ASR evaluation over model (default model is fairseq2 wav2vec2_asr) on a HF dataset reader (default dataset is librispeech_asr). It is important to note that this preset can be customized to beyond librispeech and other models too.

So, if we stick to the above design, a simpler extension to enable Whisper eval would be:

Re-use the HFEvaluator and hf_presets registry (we don't want 2 registries with 1 preset each, but a central registry storing all the presets)
Make an entry function "load_asr_evaluator(config)" that fans out to 2 functions, one for fairseq2 model such as wav2vec2 and one for whisper model separately based on the config.model_name. These 2 functions are similar, except the way the model is loaded (either using fairseq2.models.load_model() or whisper.load_model())
Change the CLI command to be able to run both, i.e.:
- For wav2vec2: fairseq2 eval asr --config model_name=wav2vec2_asr_base_10h dataset_name=librispeech_asr
- For Whisper: fairseq2 eval asr --config model_name=whisper/base dataset_name=librispeech_asr

antoine-tran · 2024-08-02T14:23:22Z

src/fairseq2/recipes/eval/__init__.py


 log = get_log_writer(__name__)


 def _add_wav2vev2_asr_eval_cli(group: CliGroup) -> None:
-    from fairseq2.recipes.eval.asr import load_wav2vec2_asr_evaluator
+    from fairseq2.recipes.eval.asr import ASREvaluator


I don't know if changing the function "load_wav2vec2_asr_evaluator" to ASREvaluator is the best way. I'm not picky between having a function or a callable class, but the problem is that Whisper is an end-to-end model, while wav2vec2 is - as the name suggests - only an encoder that generates a vector. For wav2vec2, we need a text tokenizer and decoder, while for Whisper it is not required. So the "ASREvaluator" is still not abstract enough (at least your currently proposal, with self.tokenizer and self.decoder)

Basically I think we need just 2 functions to generate the HFEvaluator accordingly from its config (see my comments above)

This makes lots of sense. I was just afraid of having two many functions just to support different models and thier requirements and that's why I created the class.

antoine-tran · 2024-08-02T14:26:56Z

src/fairseq2/recipes/eval/__init__.py

-        load_wav2vec2_asr_evaluator,
-        preset_configs=hf_presets,
+        ASREvaluator(),
+        preset_configs=wav2vec2_presets,


we don'y need to have 2 registries for wav2vec2 and whisper

antoine-tran · 2024-08-02T14:27:51Z

src/fairseq2/recipes/eval/__init__.py

@@ -26,6 +30,21 @@ def _add_wav2vev2_asr_eval_cli(group: CliGroup) -> None:
    )


+def _add_whisper_asr_eval_cli(group: CliGroup) -> None:


This is highly redundant I think. We can try parameterize the evaluator setup function, the presets can be customized at runtime.

antoine-tran · 2024-08-02T14:29:10Z

src/fairseq2/recipes/eval/asr.py

-    model = load_wav2vec2_asr_model(
-        config.model_name, device=init_device, dtype=config.dtype
+@whisper_presets.decorator("librispeech_asr")
+def _whisper_librispeech_asr_config() -> AsrEvalConfig:


One default preset is enough I think. If the user wants to evaluate Whisper with librispeech_asr or other datasets, then they should specify it directly at runtime.

Otherwise we will have MxN presets for M models and N datasets :).

Ahmedsaed · 2024-08-02T15:20:44Z

@antoine-tran thanks for the information. This makes things a lot clearer. The motivation behind the class was to encapsulate the helper functions that was created in the process. I will apply the suggestions and refactor the code again into functions.

Ahmedsaed added 2 commits August 1, 2024 14:27

Rename configs to it's respective models

10cbde6

Refactor asr.py into ASREvaluator

c3be495

Ahmedsaed changed the title ~~Eval/whisper~~ Integrate whisper model with eval interface Aug 1, 2024

Refactor ASREvaluator with _load_model and _load_dataset

4a91e4b

antoine-tran reviewed Aug 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate whisper model with eval interface #2

Integrate whisper model with eval interface #2

Ahmedsaed commented Aug 1, 2024

antoine-tran commented Aug 2, 2024 •

edited

Loading

antoine-tran Aug 2, 2024 •

edited

Loading

Ahmedsaed Aug 2, 2024

antoine-tran Aug 2, 2024

antoine-tran Aug 2, 2024

antoine-tran Aug 2, 2024 •

edited

Loading

Ahmedsaed commented Aug 2, 2024

		@@ -26,6 +30,21 @@ def _add_wav2vev2_asr_eval_cli(group: CliGroup) -> None:
		)


		def _add_whisper_asr_eval_cli(group: CliGroup) -> None:

Integrate whisper model with eval interface #2

Are you sure you want to change the base?

Integrate whisper model with eval interface #2

Conversation

Ahmedsaed commented Aug 1, 2024

antoine-tran commented Aug 2, 2024 • edited Loading

antoine-tran Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Ahmedsaed Aug 2, 2024

Choose a reason for hiding this comment

antoine-tran Aug 2, 2024

Choose a reason for hiding this comment

antoine-tran Aug 2, 2024

Choose a reason for hiding this comment

antoine-tran Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Ahmedsaed commented Aug 2, 2024

antoine-tran commented Aug 2, 2024 •

edited

Loading

antoine-tran Aug 2, 2024 •

edited

Loading

antoine-tran Aug 2, 2024 •

edited

Loading