Clarification regarding how the accuracy.txt file is generated #1861

arjunsuresh · 2024-09-27T15:05:04Z

The submission generation rules for inference says that the accuracy.txt file should be generated from the accuracy scripts. My interpretation of this is that one should run the reference accuracy scripts stand alone using the logs from the accuracy run and obtain this accuracy.txt file and not dump the accuracy.txt file with in the implementation code. Is this the correct interpretation?

accuracy.txt # stdout of reference accuracy scripts

The text was updated successfully, but these errors were encountered:

arjunsuresh · 2024-09-27T15:06:12Z

@psyhtest @ashwin @attafosu Can you please confirm?

attafosu · 2024-10-02T19:01:15Z

@arjunsuresh Yes, that's correct.

psyhtest · 2024-10-02T23:31:13Z

I can think of a situation when an implementer refactors/integrates a reference script into their own script. For example, the reference script may hardcode using /usr/bin/python3, while they may want to use /usr/local/bin/python3.8. In this case, we can probably request that no material changes should be done during such refactoring/integration, but not that the reference script must always be run stand alone?

arjunsuresh · 2024-10-03T00:09:28Z

Thank you @attafosu @psyhtest

@psyhtest yes, running the reference accuracy script standalone is fine I believe. But this is not that straightforward as it often requires the original dataset and so we do have some submissions where accuracy.txt is generated from the benchmark run itself without calling the reference script. We didn't see any accuracy issue when running the standalone script for those submissions, but I believe this should not be allowed.

psyhtest · 2024-10-07T14:22:25Z

@arjunsuresh

But you admit that in some cases it may not be straightforward:

yes, running the reference accuracy script standalone is fine I believe.
But this is not that straightforward

So why would we disallow it in such cases?

arjunsuresh · 2024-10-07T14:39:35Z

@psyhtest I'm not telling to disallow running the reference accuracy script in a custom way - say like within another python file. But I don't think it is right to allow generation of the accuracy.txt file by mimicking the actions of the reference script - because it becomes hard to verify this for other people.

We face this issue specifically for automating DLRMv2 submissions where to generate the accuracy.txt file we need the day23 criteo dataset which is not possible to be downloaded in an non-interactive way. But if we are allowed to generate the accuracy.txt file from within the benchmark implementation we possibly do not need this file at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification regarding how the accuracy.txt file is generated #1861

Clarification regarding how the accuracy.txt file is generated #1861

arjunsuresh commented Sep 27, 2024 •

edited

Loading

arjunsuresh commented Sep 27, 2024

attafosu commented Oct 2, 2024

psyhtest commented Oct 2, 2024

arjunsuresh commented Oct 3, 2024

psyhtest commented Oct 7, 2024

arjunsuresh commented Oct 7, 2024

Clarification regarding how the accuracy.txt file is generated #1861

Clarification regarding how the accuracy.txt file is generated #1861

Comments

arjunsuresh commented Sep 27, 2024 • edited Loading

arjunsuresh commented Sep 27, 2024

attafosu commented Oct 2, 2024

psyhtest commented Oct 2, 2024

arjunsuresh commented Oct 3, 2024

psyhtest commented Oct 7, 2024

arjunsuresh commented Oct 7, 2024

arjunsuresh commented Sep 27, 2024 •

edited

Loading