Add validation set to EvalAI #30

dchichkov · 2024-08-14T18:45:36Z

Would it be possible to add MMMU validation to EvalAI?

It'd be great to be able to compare the numbers calculated on the validation set with the ones produced by EvalAI.

xiangyue9607 · 2024-08-14T18:47:01Z

Thank you! That is a good suggestion. We will consider it! Will update here later!

dchichkov · 2024-08-15T17:59:25Z

Thanks! The issue is, we see a consistent gap between validation and test set results. While models did not use the validation set to optimize. Multiple teams resorted to reporting validation rather than test results in their papers. I'm guessing this could be because they don't trust the test result (which they can't repro/validate). It'd be good to triage and rectify that, at least by having the validation part reproducible with the EvalAI measurement.

MMMU is a great benchmark. It measures overall LLM/VLM performance. But these issues with test/validation discrepancies (and misunderstanding that it's not just the visual part that matters) give it some bad light.

I'd also suggest considering releasing the test set, maybe under a separate NC license and token password protection, to avoid accidental contamination. The benefits of the test set being used and potential cleanup/resolving this test/validation gap can outweigh the benefits of using the test set in a more controlled environment.

xiangyue9607 · 2024-08-15T18:04:57Z

Thank you for your feedback. The discrepancy between the validation and test sets arises from the slight differences in their distributions. In the validation set, each subject has an equal number of samples, whereas in the test set, the number of samples per subject varies.

We are also considering releasing a portion of the test set while retaining a small part to prevent contamination or overfitting. We appreciate your valuable comments and encourage you to stay tuned for further updates!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add validation set to EvalAI #30

Add validation set to EvalAI #30

dchichkov commented Aug 14, 2024

xiangyue9607 commented Aug 14, 2024

dchichkov commented Aug 15, 2024

xiangyue9607 commented Aug 15, 2024

Add validation set to EvalAI #30

Add validation set to EvalAI #30

Comments

dchichkov commented Aug 14, 2024

xiangyue9607 commented Aug 14, 2024

dchichkov commented Aug 15, 2024

xiangyue9607 commented Aug 15, 2024