Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hf metric wrapper #599

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add hf metric wrapper #599

wants to merge 7 commits into from

Conversation

antoine-tran
Copy link
Contributor

@antoine-tran antoine-tran commented Jun 18, 2024

What does this PR do? Please describe:

This PR adds a wrapper to HuggingFace's evaluate.Metric to make it compatible to fairseq2.metrics APIs. This enables evaluating fairseq2 model on many downstream tasks available in HuggingFace, using standard evaluation metrics such as comet, bleu(t), bertscore, etc.

Small changes in fairseq2.metrics:

evaluate.Metric has internal support for multi-node evaluation, where the incremental metric updating code run in separate nodes (and writing results to temporary PyArrow tables), after that the final compute() is done on rank 0 only, pulling results from the tables. To adapt this to fairseq2.metrics.bag.sync_and_compute_metrics() , we introduce theauto_sync attribute in the MetricBag, that will turn on if it has HF metrics. In other words, we leave the synchronization to the underlying metrics and not in the bag itself.

One caveat is that we can not mix the HF and non-HF metrics within one bag, but put them in separate bags.

Fixes #{issue number}

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2024
@cbalioglu
Copy link
Contributor

I think we might need to reconsider the generic metrics API in fairseq2 before merging this PR. Right now it pretty much uses torcheval which apparently uses a different approach to syncing metrics. I have a fundamental question though. In your use case, how do you combine fairseq2 metrics and HG evaluate? Looks like HG has some similar APIs like combine that acts like MetricBag. Which approach does make more sense in your opinion (1) using the two APIs in parallel in your code or (2) abstracting away HG under fairseq2.metrics (i.e. this PR)? If the latter, we might consider defining a Metric interface in fairseq2 (instead of using torcheval'a Metrics interface) and have derived interfaces for torcheval and HG that fit better to the corresponding libraries.

@antoine-tran
Copy link
Contributor Author

I think we might need to reconsider the generic metrics API in fairseq2 before merging this PR. Right now it pretty much uses torcheval which apparently uses a different approach to syncing metrics.

I think the decision of binding fairseq2 to torcheval is a good one. It is a thin, low-level library to help developing further libraries on top. HG on the other hand targets ready-to-use use cases and it makes sense to hide the synchronization logic inside.

I have a fundamental question though. In your use case, how do you combine fairseq2 metrics and HG evaluate? Looks like HG has some similar APIs like combine that acts like MetricBag. Which approach does make more sense in your opinion (1) using the two APIs in parallel in your code or (2) abstracting away HG under fairseq2.metrics (i.e. this PR)? If the latter, we might consider defining a Metric interface in fairseq2 (instead of using torcheval'a Metrics interface) and have derived interfaces for torcheval and HG that fit better to the corresponding libraries.

Good questions. Until now I always followed (1) to have a separate metrics API in evaluation tasks. It served well my use cases (pure evaluation), and now I am thinking of using the evaluation within the training loop. Having the same MetricBag API helps me "register" the states and continue later.

@antoine-tran
Copy link
Contributor Author

@cbalioglu Maybe we can indeed wait a bit for the use case to be clearer and gather more requirements before merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants