Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Question List #1 #1860

Open
Agalakdak opened this issue Sep 27, 2024 · 3 comments
Open

Benchmarking Question List #1 #1860

Agalakdak opened this issue Sep 27, 2024 · 3 comments

Comments

@Agalakdak
Copy link

Hello everyone.
I have been using MLperf benchmarks for some time. And I have a small list of questions about them. I am asking them here because I have not found answers in other sources of information.

  1. I have several video cards in my system. Can I explicitly set the number of video cards for the test?

  2. This question follows from the question above. Do all tests use all available GPUs?

  3. Many tests have different profiles like "edge"
    "datacenter" what is the difference between them?

  4. Since the space on my SSD is limited, how can I tell the benchmarks to use a different directory to store the cache?

  5. The tests (in the profiles that I used) do not always use 100% of the video memory. Are there any scenarios for which all the video memory will be used or is this not necessary?

  6. Perhaps there are more subtle benchmark settings, is there any user guide.

@arjunsuresh
Copy link
Contributor

Hi @Agalakdak Some of your questions are "benchmark implementation" dependent and we currently have Nvidia, Intel, and Reference implementations for most/all of the benchmarks and other vendor implementations are available for some of the benchmarks.

  1. "no" for most of the reference implementations except some like for llama2. "yes" for Nvidia implementation though it uses all the GPUs by default.
  2. For Nvidia implementation - "yes". For reference implementation, it uses 1 GPU by default and in some benchmark implementations it supports multiple GPUs.
  3. Those are 2 different submission categories. The required scenarios to be run differs for them and "Offline" scenario is the only common one for both.
  4. export CM_REPOS=<NEW_PATH> can be used to do this or we can create softlink for any folder inside $HOME/CM/repos/local/cache path.
  5. Many small inference models do not need large amount of GPU memory. Parameter size given here is usually a good guide for the required GPU memory.
  6. Unfortunately not much currently - as most implementations by default only support the systems on which the MLPerf results were submitted. We are trying to extend this - but it is a WIP and the implementations and the results changes every 6 months.

@Agalakdak
Copy link
Author

@arjunsuresh , Thank you! And the last 2 questions for now.

  1. How can benchmarking results be interpreted? Is this some abstract metric or can the data be interpreted as "Model A can process x requests per second".
  2. How can I donate $10

@arjunsuresh
Copy link
Contributor

  1. For offline scenario - samples per second is the usual metric. Requests per second or queries per second may not be correct as a single request or query can contain multiple samples. But for LLMs it is often tokens per second.
  2. I don't think MLCommons is taking donations but I might be wrong. You can contact the right people here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants