-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hf metric wrapper #599
base: main
Are you sure you want to change the base?
Conversation
I think we might need to reconsider the generic metrics API in fairseq2 before merging this PR. Right now it pretty much uses torcheval which apparently uses a different approach to syncing metrics. I have a fundamental question though. In your use case, how do you combine fairseq2 metrics and HG evaluate? Looks like HG has some similar APIs like |
I think the decision of binding fairseq2 to torcheval is a good one. It is a thin, low-level library to help developing further libraries on top. HG on the other hand targets ready-to-use use cases and it makes sense to hide the synchronization logic inside.
Good questions. Until now I always followed (1) to have a separate metrics API in evaluation tasks. It served well my use cases (pure evaluation), and now I am thinking of using the evaluation within the training loop. Having the same MetricBag API helps me "register" the states and continue later. |
@cbalioglu Maybe we can indeed wait a bit for the use case to be clearer and gather more requirements before merging this PR. |
What does this PR do? Please describe:
This PR adds a wrapper to HuggingFace's
evaluate.Metric
to make it compatible to fairseq2.metrics APIs. This enables evaluating fairseq2 model on many downstream tasks available in HuggingFace, using standard evaluation metrics such ascomet
,bleu(t)
,bertscore
, etc.Small changes in fairseq2.metrics:
evaluate.Metric
has internal support for multi-node evaluation, where the incremental metric updating code run in separate nodes (and writing results to temporary PyArrow tables), after that the finalcompute()
is done on rank 0 only, pulling results from the tables. To adapt this tofairseq2.metrics.bag.sync_and_compute_metrics()
, we introduce theauto_sync
attribute in the MetricBag, that will turn on if it has HF metrics. In other words, we leave the synchronization to the underlying metrics and not in the bag itself.One caveat is that we can not mix the HF and non-HF metrics within one bag, but put them in separate bags.
Fixes #{issue number}
Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: