From c4d0b3ea98e6fe7252e50cb573f0d523da7979df Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Tue, 24 Sep 2024 17:39:18 +0100 Subject: [PATCH] Update docs: SCC24, fix broken redirect (#1843) * Support batch-size in llama2 run * Add Rclone-Cloudflare download instructions to README.md * Add Rclone-Cloudflare download instructiosn to README.md * Minor wording edit to README.md * Add Rclone-Cloudflare download instructions to README.md * Add Rclone-GDrive download instructions to README.md * Add new and old instructions to README.md * Tweak language in README.md * Language tweak in README.md * Minor language tweak in README.md * Fix typo in README.md * Count error when logging errors: submission_checker.py * Fixes #1648, restrict loadgen uncommitted error message to within the loadgen directory * Update test-rnnt.yml (#1688) Stopping the github action for rnnt * Added docs init Added github action for website publish Update benchmark documentation Update publish.yaml Update publish.yaml Update benchmark documentation Improved the submission documentation Fix taskname Removed unused images * Fix benchmark URLs * Fix links * Add _full variation to run commands * Added script flow diagram * Added docker setup command for CM, extra run options * Added support for docker options in the docs * Added --quiet to the CM run_cmds in docs * Fix the test query count for cm commands * Support ctuning-cpp implementation * Added commands for mobilenet models * Docs cleanup * Docs cleanup * Added separate files for dataset and models in the docs * Remove redundant tab in the docs * Fixes some WIP models in the docs * Use the official docs page for CM installation * Fix the deadlink in docs * Fix indendation issue in docs * Added dockerinfo for nvidia implementation * Added run options for gptj * Added execution environment tabs * Cleanup of the docs * Cleanup of the docs * Reordered the sections of the docs page * Removed an unnecessary heading in the docs * Fixes the commands for datacenter * Fix the build --sdist for loadgen * Fixes #1761, llama2 and mixtral runtime error on CPU systems * Added mixtral to the benchmark list, improved benchmark docs * Update docs for MLPerf inference v4.1 * Update docs for MLPerf inference v4.1 * Fix typo * Gave direct link to implementation readmes * Added tables detailing implementations * Update vision README.md, split the frameworks into separate rows * Update README.md * pointed links to specific frameworks * pointed links to specific frameworks * Update Submission_Guidelines.md * Update Submission_Guidelines.md * Update Submission_Guidelines.md * api support llama2 * Added request module and reduced max token len * Fix for llama2 api server * Update SUT_API offline to work for OpenAI * Update SUT_API.py * Minor fixes * Fix json import in SUT_API.py * Fix llama2 token length * Added model name verification with server * clean temp files * support num_workers in LLAMA2 SUTs * Remove batching from Offline SUT_API.py * Update SUT_API.py * Minor fixes for llama2 API * Fix for llama2 API * removed table of contents * enabled llama2-nvidia + vllm-NM : WIP * enabled dlrm for intel * lower cased implementation * added raw data input * corrected data download commands * renamed filename * changes for bert and vllm * documentation to work on custom repo and branch * benchmark index page update * enabled sdxl for nvidia and intel * updated vllm server run cmd * benchmark page information addition * fix indendation issue * Added submission categories * update submission page - generate submission with or w/o using CM for benchmarking * Updated kits dataset documentation * Updated model parameters * updation of information * updated non cm based benchmark * added info about hf password * added links to model and access tokens * Updated reference results structuree tree * submission docs cleanup * Some cleanups for benchmark info * Some cleanups for benchmark info * Some cleanups for benchmark info * added generic stubs deepsparse * Some cleanups for benchmark info * Some cleanups for benchmark info * Some cleanups for benchmark info * Some cleanups for benchmark info (FID and CLIP data added) * typo fix for bert deepsparse framework * added min system requirements for models * fixed code version * changes for displaying reference and intel implementation tip * added reference to installation page * updated neural magic documentation * Added links to the install page, redirect benchmarks page * added tips about batch size and dataset for nvidia llama2 * fix conditions logic * modified tips and additional run cmds * sentence corrections * Minor fix for the documentation * fixed bug in deepsparse generic model stubs + styling * added more information to stubs * Added SCC24 readme, support reproducibility in the docs * Made clear the custom CM repo URL format * Support conditional implementation, setup and run tips * Support rocm for sdxl * Fix _short tag support * Fix install URL * Expose bfloat16 and float16 options for sdxl * Expose download model to host option for sdxl * IndySCC24 documentation added * Improve the SCC24 docs * Improve the support of short variation * Improved the indyscc24 documentation * Updated scc run commands * removed test_query_count option for scc * Remove scc24 in the main docs * Remove scc24 in the main docs * Fix docs: indendation issue on the submission page * generalised code for skipping test query count * Fixes for SCC24 docs * Fix scenario text in main.py * Fix links for scc24 * Fix links for scc24 * Improve the general docs * Fix links for scc24 * Use float16 in scc24 doc * Improve scc24 docs * Improve scc24 docs * Use float16 in scc24 doc * fixed command bug --------- Co-authored-by: Nathan Wasson Co-authored-by: anandhu-eng Co-authored-by: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com> Co-authored-by: Michael Goin --- .../image_classification/mobilenets.md | 2 + .../image_classification/resnet50.md | 2 + docs/benchmarks/language/gpt-j.md | 1 - docs/benchmarks/language/llama2-70b.md | 3 +- docs/benchmarks/language/mixtral-8x7b.md | 4 +- .../reproducibility/indyscc24-bert.md | 48 ++++ docs/benchmarks/medical_imaging/3d-unet.md | 1 - docs/benchmarks/recommendation/dlrm-v2.md | 4 +- .../text_to_image/reproducibility/scc24.md | 96 +++++++ docs/install/index.md | 26 +- docs/requirements.txt | 2 + docs/submission/index.md | 90 +++---- main.py | 238 +++++++++++------- mkdocs.yml | 20 +- text_to_image/main.py | 2 +- 15 files changed, 376 insertions(+), 163 deletions(-) create mode 100644 docs/benchmarks/language/reproducibility/indyscc24-bert.md create mode 100644 docs/benchmarks/text_to_image/reproducibility/scc24.md diff --git a/docs/benchmarks/image_classification/mobilenets.md b/docs/benchmarks/image_classification/mobilenets.md index f276008ef..09840ef1d 100644 --- a/docs/benchmarks/image_classification/mobilenets.md +++ b/docs/benchmarks/image_classification/mobilenets.md @@ -5,6 +5,8 @@ hide: # Image Classification using Mobilenet models +Install CM following the [installation page](site:install). + Mobilenet models are not official MLPerf models and so cannot be used for a Closed division MLPerf inference submission. But since they can be run with Imagenet dataset, we are allowed to use them for Open division submission. Only CPU runs are supported now. ## TFLite Backend diff --git a/docs/benchmarks/image_classification/resnet50.md b/docs/benchmarks/image_classification/resnet50.md index 62b966e0d..4172158dc 100644 --- a/docs/benchmarks/image_classification/resnet50.md +++ b/docs/benchmarks/image_classification/resnet50.md @@ -3,8 +3,10 @@ hide: - toc --- + # Image Classification using ResNet50 + === "MLCommons-Python" ## MLPerf Reference Implementation in Python diff --git a/docs/benchmarks/language/gpt-j.md b/docs/benchmarks/language/gpt-j.md index 4dcb3d70e..d2f545815 100644 --- a/docs/benchmarks/language/gpt-j.md +++ b/docs/benchmarks/language/gpt-j.md @@ -5,7 +5,6 @@ hide: # Text Summarization using GPT-J - === "MLCommons-Python" ## MLPerf Reference Implementation in Python diff --git a/docs/benchmarks/language/llama2-70b.md b/docs/benchmarks/language/llama2-70b.md index 0d9a0504d..e68693716 100644 --- a/docs/benchmarks/language/llama2-70b.md +++ b/docs/benchmarks/language/llama2-70b.md @@ -5,7 +5,6 @@ hide: # Text Summarization using LLAMA2-70b - === "MLCommons-Python" ## MLPerf Reference Implementation in Python @@ -25,4 +24,4 @@ hide: {{ mlperf_inference_implementation_readme (4, "llama2-70b-99", "neuralmagic") }} -{{ mlperf_inference_implementation_readme (4, "llama2-70b-99.9", "neuralmagic") }} \ No newline at end of file +{{ mlperf_inference_implementation_readme (4, "llama2-70b-99.9", "neuralmagic") }} diff --git a/docs/benchmarks/language/mixtral-8x7b.md b/docs/benchmarks/language/mixtral-8x7b.md index 9f3bf2992..bdb26ae77 100644 --- a/docs/benchmarks/language/mixtral-8x7b.md +++ b/docs/benchmarks/language/mixtral-8x7b.md @@ -3,7 +3,9 @@ hide: - toc --- +# Question Answering, Math, and Code Generation using Mixtral-8x7B + === "MLCommons-Python" ## MLPerf Reference Implementation in Python -{{ mlperf_inference_implementation_readme (4, "mixtral-8x7b", "reference") }} \ No newline at end of file +{{ mlperf_inference_implementation_readme (4, "mixtral-8x7b", "reference") }} diff --git a/docs/benchmarks/language/reproducibility/indyscc24-bert.md b/docs/benchmarks/language/reproducibility/indyscc24-bert.md new file mode 100644 index 000000000..68215c5e1 --- /dev/null +++ b/docs/benchmarks/language/reproducibility/indyscc24-bert.md @@ -0,0 +1,48 @@ +--- +hide: + - toc +--- + +# Question and Answering using Bert Large for IndySCC 2024 + +## Introduction + +This guide is designed for the [IndySCC 2024](https://sc24.supercomputing.org/students/indyscc/) to walk participants through running and optimizing the [MLPerf Inference Benchmark](https://arxiv.org/abs/1911.02549) using [Bert Large](https://github.com/mlcommons/inference/tree/master/language/bert#supported-models) across various software and hardware configurations. The goal is to maximize system throughput (measured in samples per second) without compromising accuracy. + +For a valid MLPerf inference submission, two types of runs are required: a performance run and an accuracy run. In this competition, we focus on the `Offline` scenario, where throughput is the key metric—higher values are better. The official MLPerf inference benchmark for Bert Large requires processing a minimum of 10833 samples in both performance and accuracy modes using the Squad v1.1 dataset. Setting up for Nvidia GPUs may take 2-3 hours but can be done offline. Your final output will be a tarball (`mlperf_submission.tar.gz`) containing MLPerf-compatible results, which you will submit to the SCC organizers for scoring. + +## Scoring + +In the SCC, your first objective will be to run a reference (unoptimized) Python implementation or a vendor-provided version (such as Nvidia's) of the MLPerf inference benchmark to secure a baseline score. + +Once the initial run is successful, you'll have the opportunity to optimize the benchmark further by maximizing system utilization, applying quantization techniques, adjusting ML frameworks, experimenting with batch sizes, and more, all of which can earn you additional points. + +Since vendor implementations of the MLPerf inference benchmark vary and are often limited to single-node benchmarking, teams will compete within their respective hardware categories (e.g., Nvidia GPUs, AMD GPUs). Points will be awarded based on the throughput achieved on your system. + + +!!! info + Both MLPerf and CM automation are evolving projects. + If you encounter issues or have questions, please submit them [here](https://github.com/mlcommons/cm4mlops/issues) + +## Artifacts to submit to the SCC committee + +You will need to submit the following files: + +* `mlperf_submission_short.tar.gz` - automatically generated file with validated MLPerf results. +* `mlperf_submission_short_summary.json` - automatically generated summary of MLPerf results. +* `mlperf_submission_short.run` - CM commands to run MLPerf BERT inference benchmark saved to this file. +* `mlperf_submission_short.tstamps` - execution timestamps before and after CM command saved to this file. +* `mlperf_submission_short.md` - description of your platform and some highlights of the MLPerf benchmark execution. + + + +=== "MLCommons-Python" + ## MLPerf Reference Implementation in Python + +{{ mlperf_inference_implementation_readme (4, "bert-99", "reference", extra_variation_tags=",_short", scenarios=["Offline"],categories=["Edge"], setup_tips=False) }} + +=== "Nvidia" + ## Nvidia MLPerf Implementation +{{ mlperf_inference_implementation_readme (4, "bert-99", "nvidia", extra_variation_tags=",_short", scenarios=["Offline"],categories=["Edge"], setup_tips=False, implementation_tips=False) }} + + diff --git a/docs/benchmarks/medical_imaging/3d-unet.md b/docs/benchmarks/medical_imaging/3d-unet.md index 01a54c63e..72d5eed49 100644 --- a/docs/benchmarks/medical_imaging/3d-unet.md +++ b/docs/benchmarks/medical_imaging/3d-unet.md @@ -5,7 +5,6 @@ hide: # Medical Imaging using 3d-unet (KiTS 2019 kidney tumor segmentation task) - === "MLCommons-Python" ## MLPerf Reference Implementation in Python diff --git a/docs/benchmarks/recommendation/dlrm-v2.md b/docs/benchmarks/recommendation/dlrm-v2.md index ce3081077..b539c1607 100644 --- a/docs/benchmarks/recommendation/dlrm-v2.md +++ b/docs/benchmarks/recommendation/dlrm-v2.md @@ -5,8 +5,6 @@ hide: # Recommendation using DLRM v2 - -## Benchmark Implementations === "MLCommons-Python" ## MLPerf Reference Implementation in Python @@ -26,4 +24,4 @@ hide: {{ mlperf_inference_implementation_readme (4, "dlrm-v2-99", "intel") }} -{{ mlperf_inference_implementation_readme (4, "dlrm-v2-99.9", "intel") }} \ No newline at end of file +{{ mlperf_inference_implementation_readme (4, "dlrm-v2-99.9", "intel") }} diff --git a/docs/benchmarks/text_to_image/reproducibility/scc24.md b/docs/benchmarks/text_to_image/reproducibility/scc24.md new file mode 100644 index 000000000..bae4eceb3 --- /dev/null +++ b/docs/benchmarks/text_to_image/reproducibility/scc24.md @@ -0,0 +1,96 @@ +--- +hide: + - toc +--- + +# Text-to-Image with Stable Diffusion for Student Cluster Competition 2024 + +## Introduction + +This guide is designed for the [Student Cluster Competition 2024](https://sc24.supercomputing.org/students/student-cluster-competition/) to walk participants through running and optimizing the [MLPerf Inference Benchmark](https://arxiv.org/abs/1911.02549) using [Stable Diffusion XL 1.0](https://github.com/mlcommons/inference/tree/master/text_to_image#supported-models) across various software and hardware configurations. The goal is to maximize system throughput (measured in samples per second) without compromising accuracy. Since the model performs poorly on CPUs, it is essential to run it on GPUs. + +For a valid MLPerf inference submission, two types of runs are required: a performance run and an accuracy run. In this competition, we focus on the `Offline` scenario, where throughput is the key metric—higher values are better. The official MLPerf inference benchmark for Stable Diffusion XL requires processing a minimum of 5,000 samples in both performance and accuracy modes using the COCO 2014 dataset. However, for SCC, we have reduced this and we also have two variants. `scc-base` variant has dataset size reduced to 50 samples, making it possible to complete both performance and accuracy runs in approximately 5-10 minutes. `scc-main` variant has dataset size of 500 and running it will fetch extra points as compared to running just the base variant. Setting up for Nvidia GPUs may take 2-3 hours but can be done offline. Your final output will be a tarball (`mlperf_submission.tar.gz`) containing MLPerf-compatible results, which you will submit to the SCC organizers for scoring. + +## Scoring + +In the SCC, your first objective will be to run `scc-base` variant for reference (unoptimized) Python implementation or a vendor-provided version (such as Nvidia's) of the MLPerf inference benchmark to secure a baseline score. + +Once the initial run is successful, you'll have the opportunity to optimize the benchmark further by maximizing system utilization, applying quantization techniques, adjusting ML frameworks, experimenting with batch sizes, and more, all of which can earn you additional points. + +Since vendor implementations of the MLPerf inference benchmark vary and are often limited to single-node benchmarking, teams will compete within their respective hardware categories (e.g., Nvidia GPUs, AMD GPUs). Points will be awarded based on the throughput achieved on your system. + +Additionally, significant bonus points will be awarded if your team enhances an existing implementation, adds support for new hardware (such as an unsupported GPU), enables multi-node execution, or adds/extends scripts to [cm4mlops repository](https://github.com/mlcommons/cm4mlops/tree/main/script) supporting new devices, frameworks, implementations etc. All improvements must be made publicly available under the Apache 2.0 license and submitted alongside your results to the SCC committee to earn these bonus points, contributing to the MLPerf community. + + +!!! info + Both MLPerf and CM automation are evolving projects. + If you encounter issues or have questions, please submit them [here](https://github.com/mlcommons/cm4mlops/issues) + +## Artifacts to submit to the SCC committee + +You will need to submit the following files: + +* `mlperf_submission.run` - CM commands to run MLPerf inference benchmark saved to this file. +* `mlperf_submission.md` - description of your platform and some highlights of the MLPerf benchmark execution. +* `` under which results are pushed to the github repository. + + +## SCC interview + +You are encouraged to highlight and explain the obtained MLPerf inference throughput on your system +and describe any improvements and extensions to this benchmark (such as adding new hardware backend +or supporting multi-node execution) useful for the community and [MLCommons](https://mlcommons.org). + +## Run Commands + +=== "MLCommons-Python" + ## MLPerf Reference Implementation in Python + +{{ mlperf_inference_implementation_readme (4, "sdxl", "reference", extra_variation_tags=",_short,_scc24-base", devices=["ROCm", "CUDA"],scenarios=["Offline"],categories=["Datacenter"], setup_tips=False, skip_test_query_count=True, extra_input_string="--precision=float16") }} + +=== "Nvidia" + ## Nvidia MLPerf Implementation +{{ mlperf_inference_implementation_readme (4, "sdxl", "nvidia", extra_variation_tags=",_short,_scc24-base", scenarios=["Offline"],categories=["Datacenter"], setup_tips=False, implementation_tips=False, skip_test_query_count=True) }} + +!!! info + Once the above run is successful, you can change `_scc24-base` to `_scc24-main` to run the main variant. + +## Submission Commands + +### Generate actual submission tree + +```bash +cm run script --tags=generate,inference,submission \ + --clean \ + --preprocess_submission=yes \ + --run-checker \ + --tar=yes \ + --env.CM_TAR_OUTFILE=submission.tar.gz \ + --division=open \ + --category=datacenter \ + --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ + --run_style=test \ + --adr.submission-checker.tags=_short-run \ + --quiet \ + --submitter= +``` + +* Use `--hw_name="My system name"` to give a meaningful system name. + + +### Push Results to GitHub + +Fork the repository URL at [https://github.com/gateoverflow/cm4mlperf-inference](https://github.com/gateoverflow/cm4mlperf-inference). + +Run the following command after **replacing `--repo_url` with your GitHub fork URL**. + +```bash +cm run script --tags=push,github,mlperf,inference,submission \ + --repo_url=https://github.com/gateoverflow/cm4mlperf-inference \ + --repo_branch=mlperf-inference-results-scc24 \ + --commit_message="Results on system " \ + --quiet +``` + +Once uploaded give a Pull Request to the origin repository. Github action will be running there and once +finished you can see your submitted results at [https://gateoverflow.github.io/cm4mlperf-inference](https://gateoverflow.github.io/cm4mlperf-inference). diff --git a/docs/install/index.md b/docs/install/index.md index 60377adee..195521c7e 100644 --- a/docs/install/index.md +++ b/docs/install/index.md @@ -8,24 +8,24 @@ We use MLCommons CM Automation framework to run MLPerf inference benchmarks. CM needs `git`, `python3-pip` and `python3-venv` installed on your system. If any of these are absent, please follow the [official CM installation page](https://docs.mlcommons.org/ck/install) to install them. Once the dependencies are installed, do the following -## Activate a VENV for CM +## Activate a Virtual ENV for CM +This step is not mandatory as CM can use separate virtual environment for MLPerf inference. But the latest `pip` install requires this or else will need the `--break-system-packages` flag while installing `cm4mlops`. + ```bash python3 -m venv cm source cm/bin/activate ``` ## Install CM and pulls any needed repositories - -```bash - pip install cm4mlops -``` - -## To work on custom GitHub repo and branch - -```bash - pip install cmind && cm init --quiet --repo=mlcommons@cm4mlops --branch=mlperf-inference -``` - -Here, repo is in the format `githubUsername@githubRepo`. +=== "Use the default fork of CM MLOps repository" + ```bash + pip install cm4mlops + ``` + +=== "Use custom fork/branch of the CM MLOps repository" + ```bash + pip install cmind && cm init --quiet --repo=mlcommons@cm4mlops --branch=mlperf-inference + ``` + Here, `repo` is in the format `githubUsername@githubRepo`. Now, you are ready to use the `cm` commands to run MLPerf inference as given in the [benchmarks](../index.md) page diff --git a/docs/requirements.txt b/docs/requirements.txt index 39fab4e1f..293abf164 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -2,3 +2,5 @@ mkdocs-material swagger-markdown mkdocs-macros-plugin ruamel.yaml +mkdocs-redirects +mkdocs-site-urls diff --git a/docs/submission/index.md b/docs/submission/index.md index a75bc3259..4f6e05c25 100644 --- a/docs/submission/index.md +++ b/docs/submission/index.md @@ -60,63 +60,63 @@ Once all the results across all the models are ready you can use the following c === "Closed Edge" ### Closed Edge Submission ```bash - cm run script --tags=generate,inference,submission \ - --clean \ - --preprocess_submission=yes \ - --run-checker \ - --submitter=MLCommons \ - --tar=yes \ - --env.CM_TAR_OUTFILE=submission.tar.gz \ - --division=closed \ - --category=edge \ - --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ - --quiet + cm run script --tags=generate,inference,submission \ + --clean \ + --preprocess_submission=yes \ + --run-checker \ + --submitter=MLCommons \ + --tar=yes \ + --env.CM_TAR_OUTFILE=submission.tar.gz \ + --division=closed \ + --category=edge \ + --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ + --quiet ``` === "Closed Datacenter" ### Closed Datacenter Submission ```bash - cm run script --tags=generate,inference,submission \ - --clean \ - --preprocess_submission=yes \ - --run-checker \ - --submitter=MLCommons \ - --tar=yes \ - --env.CM_TAR_OUTFILE=submission.tar.gz \ - --division=closed \ - --category=datacenter \ - --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ - --quiet + cm run script --tags=generate,inference,submission \ + --clean \ + --preprocess_submission=yes \ + --run-checker \ + --submitter=MLCommons \ + --tar=yes \ + --env.CM_TAR_OUTFILE=submission.tar.gz \ + --division=closed \ + --category=datacenter \ + --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ + --quiet ``` === "Open Edge" ### Open Edge Submission ```bash - cm run script --tags=generate,inference,submission \ - --clean \ - --preprocess_submission=yes \ - --run-checker \ - --submitter=MLCommons \ - --tar=yes \ - --env.CM_TAR_OUTFILE=submission.tar.gz \ - --division=open \ - --category=edge \ - --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ - --quiet + cm run script --tags=generate,inference,submission \ + --clean \ + --preprocess_submission=yes \ + --run-checker \ + --submitter=MLCommons \ + --tar=yes \ + --env.CM_TAR_OUTFILE=submission.tar.gz \ + --division=open \ + --category=edge \ + --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ + --quiet ``` === "Open Datacenter" ### Closed Datacenter Submission ```bash - cm run script --tags=generate,inference,submission \ - --clean \ - --preprocess_submission=yes \ - --run-checker \ - --submitter=MLCommons \ - --tar=yes \ - --env.CM_TAR_OUTFILE=submission.tar.gz \ - --division=open \ - --category=datacenter \ - --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ - --quiet + cm run script --tags=generate,inference,submission \ + --clean \ + --preprocess_submission=yes \ + --run-checker \ + --submitter=MLCommons \ + --tar=yes \ + --env.CM_TAR_OUTFILE=submission.tar.gz \ + --division=open \ + --category=datacenter \ + --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \ + --quiet ``` * Use `--hw_name="My system name"` to give a meaningful system name. Examples can be seen [here](https://github.com/mlcommons/inference_results_v3.0/tree/main/open/cTuning/systems) @@ -134,7 +134,7 @@ If you are collecting results across multiple systems you can generate different Run the following command after **replacing `--repo_url` with your GitHub repository URL**. ```bash - cm run script --tags=push,github,mlperf,inference,submission \ +cm run script --tags=push,github,mlperf,inference,submission \ --repo_url=https://github.com/GATEOverflow/mlperf_inference_submissions_v4.1 \ --commit_message="Results on added by " \ --quiet diff --git a/main.py b/main.py index c9e3e1b56..aa8dd769e 100755 --- a/main.py +++ b/main.py @@ -1,7 +1,7 @@ def define_env(env): @env.macro - def mlperf_inference_implementation_readme(spaces, model, implementation): + def mlperf_inference_implementation_readme(spaces, model, implementation, *, implementation_tips=True, setup_tips=True, run_tips=True, skip_test_query_count=False, scenarios = [], devices=[], frameworks=[], categories=[], extra_variation_tags="", extra_input_string="", extra_docker_input_string=""): pre_space = "" for i in range(1,spaces): @@ -10,28 +10,32 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): pre_space += " " content="" - scenarios = [] + execution_envs = ["Docker","Native"] code_version="r4.1-dev" + implementation_run_options = [] if model == "rnnt": code_version="r4.0" if implementation == "reference": # Tip - if "99.9" not in model: + if "99.9" not in model and implementation_tips: content += f"\n{pre_space}!!! tip\n\n" content += f"{pre_space} - MLCommons reference implementations are only meant to provide a rules compliant reference implementation for the submitters and in most cases are not best performing. If you want to benchmark any system, it is advisable to use the vendor MLPerf implementation for that system like Nvidia, Intel etc.\n\n" - devices = [ "CPU", "CUDA", "ROCm" ] - if model.lower() == "resnet50": - frameworks = [ "Onnxruntime", "Tensorflow", "Deepsparse" ] - elif model.lower() == "retinanet": - frameworks = [ "Onnxruntime", "Pytorch" ] - elif "bert" in model.lower(): - frameworks = [ "Pytorch", "Deepsparse" ] - else: - frameworks = [ "Pytorch" ] + if not devices: + devices = [ "CPU", "CUDA", "ROCm" ] + + if not frameworks: + if model.lower() == "resnet50": + frameworks = [ "Onnxruntime", "Tensorflow", "Deepsparse" ] + elif model.lower() == "retinanet": + frameworks = [ "Onnxruntime", "Pytorch" ] + elif "bert" in model.lower(): + frameworks = [ "Pytorch", "Deepsparse" ] + else: + frameworks = [ "Pytorch" ] elif implementation == "nvidia": if model in [ "mixtral-8x7b" ]: @@ -45,7 +49,7 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): elif implementation == "intel": # Tip - if "99.9" not in model: + if "99.9" not in model and implementation_tips: content += f"\n{pre_space}!!! tip\n\n" content += f"{pre_space} - Intel MLPerf inference implementation is available only for datacenter category and has been tested only on a limited number of systems. Most of the benchmarks using Intel implementation require at least Intel Sapphire Rapids or higher CPU generation.\n\n" @@ -64,7 +68,8 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): frameworks = [ "Glow" ] elif implementation == "cpp": - devices = [ "CPU", "CUDA" ] + if not devices: + devices = [ "CPU", "CUDA" ] frameworks = [ "Onnxruntime" ] elif implementation == "ctuning-cpp": @@ -75,22 +80,27 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): else: frameworks = [] - if model.lower() == "bert-99.9": - categories = [ "Datacenter" ] - elif "dlrm" in model.lower() or "llama2" in model.lower() or "mixtral" in model.lower(): - categories = [ "Datacenter" ] - else: - categories = [ "Edge", "Datacenter" ] + if not categories: + if model.lower() == "bert-99.9": + categories = [ "Datacenter" ] + elif "dlrm" in model.lower() or "llama2" in model.lower() or "mixtral" in model.lower(): + categories = [ "Datacenter" ] + else: + categories = [ "Edge", "Datacenter" ] # model name content += f"{pre_space}{model.upper()}\n\n" + + final_run_mode = "valid" if "short" not in extra_variation_tags else "test" + for category in categories: - if category == "Edge" and not scenarios: - scenarios = [ "Offline", "SingleStream" ] - if model.lower() in [ "resnet50", "retinanet" ] and not "MultiStream" in scenarios:#MultiStream was duplicating - scenarios.append("MultiStream") - elif category == "Datacenter": - scenarios = [ "Offline", "Server" ] + if not scenarios: + if category == "Edge" and not scenarios: + scenarios = [ "Offline", "SingleStream" ] + if model.lower() in [ "resnet50", "retinanet" ] and not "MultiStream" in scenarios:#MultiStream was duplicating + scenarios.append("MultiStream") + elif category == "Datacenter": + scenarios = [ "Offline", "Server" ] content += f"{pre_space}=== \"{category.lower()}\"\n\n" @@ -128,35 +138,49 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): content += f"{cur_space2}=== \"{execution_env}\"\n" content += f"{cur_space3}###### {execution_env} Environment\n\n" # ref to cm installation - content += f"{cur_space3}Please refer to the [installation page](../../install/index.md) to install CM for running the automated benchmark commands.\n\n" - test_query_count=get_test_query_count(model, implementation, device) + content += f"{cur_space3}Please refer to the [installation page](site:inference/install/) to install CM for running the automated benchmark commands.\n\n" + test_query_count=get_test_query_count(model, implementation, device.lower()) if "99.9" not in model: #not showing docker command as it is already done for the 99% variant if implementation == "neuralmagic": content += f"{cur_space3}####### Run the Inference Server\n" content += get_inference_server_run_cmd(spaces+16,implementation) - # tips regarding the running of nural magic server - content += f"\n{cur_space3}!!! tip\n\n" - content += f"{cur_space3} - Host and Port number of the server can be configured through `--host` and `--port`. Otherwise, server will run on default host `localhost` and port `8000`.\n\n" + if run_tips: + # tips regarding the running of nural magic server + content += f"\n{cur_space3}!!! tip\n\n" + content += f"{cur_space3} - Host and Port number of the server can be configured through `--host` and `--port` options. Otherwise, server will run on the default host `localhost` and port `8000`.\n\n" + setup_run_cmd = mlperf_inference_run_command(spaces+17, model, implementation, framework.lower(), category.lower(), "Offline", device.lower(), "test", test_query_count, True, skip_test_query_count, scenarios, code_version, extra_variation_tags, extra_input_string, extra_docker_input_string) + if execution_env == "Native": # Native implementation steps through virtual environment content += f"{cur_space3}####### Setup a virtual environment for Python\n" content += get_venv_command(spaces+16) content += f"{cur_space3}####### Performance Estimation for Offline Scenario\n" - content += mlperf_inference_run_command(spaces+17, model, implementation, framework.lower(), category.lower(), "Offline", device.lower(), "test", test_query_count, True, scenarios, code_version).replace("--docker ","") + + content += setup_run_cmd.replace("--docker ", "") + content += f"{cur_space3}The above command should do a test run of Offline scenario and record the estimated offline_target_qps.\n\n" else: # Docker implementation steps content += f"{cur_space3}####### Docker Container Build and Performance Estimation for Offline Scenario\n" - docker_info = get_docker_info(spaces+16, model, implementation, device) + docker_info = get_docker_info(spaces+16, model, implementation, device, setup_tips) content += docker_info - content += mlperf_inference_run_command(spaces+17, model, implementation, framework.lower(), category.lower(), "Offline", device.lower(), "test", test_query_count, True, scenarios, code_version) - content += f"{cur_space3}The above command should get you to an interactive shell inside the docker container and do a quick test run for the Offline scenario. Once inside the docker container please do the below commands to do the accuracy + performance runs for each scenario.\n\n" + + content += setup_run_cmd + + if len(scenarios) == 1: + scenario_text = f"""the {scenarios[0]} scenario""" + else: + scenario_text = "each scenario""" + content += f"{cur_space3}The above command should get you to an interactive shell inside the docker container and do a quick test run for the Offline scenario. Once inside the docker container please do the below commands to do the accuracy + performance runs for {scenario_text}.\n\n" content += f"{cur_space3}
\n" content += f"{cur_space3} Please click here to see more options for the docker launch \n\n" - content += f"{cur_space3}* `--docker_cm_repo=`: to use a custom fork of cm4mlops repository inside the docker image\n\n" + content += f"{cur_space3}* `--docker_cm_repo=`: to use a custom fork of cm4mlops repository inside the docker image\n\n" content += f"{cur_space3}* `--docker_cache=no`: to not use docker cache during the image build\n" + if implementation.lower() == "nvidia": + content += f"{cur_space3}* `--gpu_name=` : The GPUs with supported configs in CM are `orin`, `rtx_4090`, `rtx_a6000`, `rtx_6000_ada`, `l4`, `t4`and `a100`. For other GPUs, default configuration as per the GPU memory will be used.\n" + if device.lower() not in [ "cuda" ]: content += f"{cur_space3}* `--docker_os=ubuntu`: ubuntu and rhel are supported. \n" content += f"{cur_space3}* `--docker_os_version=20.04`: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL\n" @@ -165,7 +189,8 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): else: content += f"{cur_space3} You can reuse the same environment as described for {model.split('.')[0]}.\n" content += f"{cur_space3}###### Performance Estimation for Offline Scenario\n" - content += mlperf_inference_run_command(spaces+17, model, implementation, framework.lower(), category.lower(), "Offline", device.lower(), "test", test_query_count, True, scenarios, code_version).replace("--docker ","") + + content += mlperf_inference_run_command(spaces+17, model, implementation, framework.lower(), category.lower(), "Offline", device.lower(), "test", test_query_count, True, skip_test_query_count, scenarios, code_version).replace("--docker ","") content += f"{cur_space3}The above command should do a test run of Offline scenario and record the estimated offline_target_qps.\n\n" @@ -174,45 +199,48 @@ def mlperf_inference_implementation_readme(spaces, model, implementation): run_suffix += f"{cur_space3} Please click here to see more options for the RUN command\n\n" run_suffix += f"{cur_space3}* Use `--division=closed` to do a closed division submission which includes compliance runs\n\n" run_suffix += f"{cur_space3}* Use `--rerun` to do a rerun even when a valid run exists\n" + if implementation.lower() == "nvidia": + run_suffix += f"{cur_space3}* `--gpu_name=` : The GPUs with supported configs in CM are `orin`, `rtx_4090`, `rtx_a6000`, `rtx_6000_ada`, `l4`, `t4`and `a100`. For other GPUs, default configuration as per the GPU memory will be used.\n" run_suffix += f"{cur_space3}
\n\n" - if "bert" in model.lower() and framework == "deepsparse": + if "bert" in model.lower() and framework.lower() == "deepsparse": run_suffix += f"{cur_space3}
\n" - run_suffix += f"{cur_space3} Please click here for generic model stubs for bert deepsparse\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned95_quant-none-vnni\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base_quant-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned95_obs_quant-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50-none-vnni\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/obert-base/pytorch/huggingface/squad/pruned90-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned97_quant-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned90-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/bert-large/pytorch/huggingface/squad/pruned80_quant-none-vnni\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned95-none-vnni\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned97-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/bert-large/pytorch/huggingface/squad/base-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/base-none\n\n" - run_suffix += f"{cur_space3}* zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base-none\n" + run_suffix += f"{cur_space3} Please click here to view available generic model stubs for bert deepsparse\n\n" + run_suffix += f"{cur_space3}* **obert-large-pruned95_quant-none-vnni:** zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned95_quant-none-vnni\n\n" + run_suffix += f"{cur_space3}* **mobilebert-none-14layer_pruned50_quant-none-vnni:** zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni\n\n" + run_suffix += f"{cur_space3}* **mobilebert-none-base_quant-none:** zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base_quant-none\n\n" + run_suffix += f"{cur_space3}* **bert-base-pruned95_obs_quant-none:** zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned95_obs_quant-none\n\n" + run_suffix += f"{cur_space3}* **mobilebert-none-14layer_pruned50-none-vnni:** zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50-none-vnni\n\n" + run_suffix += f"{cur_space3}* **obert-base-pruned90-none:** zoo:nlp/question_answering/obert-base/pytorch/huggingface/squad/pruned90-none\n\n" + run_suffix += f"{cur_space3}* **obert-large-pruned97_quant-none:** zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned97_quant-none\n\n" + run_suffix += f"{cur_space3}* **bert-base-pruned90-none:** zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned90-none\n\n" + run_suffix += f"{cur_space3}* **bert-large-pruned80_quant-none-vnni:** zoo:nlp/question_answering/bert-large/pytorch/huggingface/squad/pruned80_quant-none-vnni\n\n" + run_suffix += f"{cur_space3}* **obert-large-pruned95-none-vnni:** zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned95-none-vnni\n\n" + run_suffix += f"{cur_space3}* **obert-large-pruned97-none:** zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/pruned97-none\n\n" + run_suffix += f"{cur_space3}* **bert-large-base-none:** zoo:nlp/question_answering/bert-large/pytorch/huggingface/squad/base-none\n\n" + run_suffix += f"{cur_space3}* **obert-large-base-none:** zoo:nlp/question_answering/obert-large/pytorch/huggingface/squad/base-none\n\n" + run_suffix += f"{cur_space3}* **mobilebert-none-base-none:** zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base-none\n" run_suffix += f"{cur_space3}
\n" for scenario in scenarios: content += f"{cur_space3}=== \"{scenario}\"\n{cur_space4}###### {scenario}\n\n" - run_cmd = mlperf_inference_run_command(spaces+21, model, implementation, framework.lower(), category.lower(), scenario, device.lower(), "valid", 0, False, scenarios, code_version) + run_cmd = mlperf_inference_run_command(spaces+21, model, implementation, framework.lower(), category.lower(), scenario, device.lower(), final_run_mode, test_query_count, False, skip_test_query_count, scenarios, code_version, extra_variation_tags, extra_input_string) content += run_cmd #content += run_suffix - - content += f"{cur_space3}=== \"All Scenarios\"\n{cur_space4}###### All Scenarios\n\n" - run_cmd = mlperf_inference_run_command(spaces+21, model, implementation, framework.lower(), category.lower(), "All Scenarios", device.lower(), "valid", 0, False, scenarios, code_version) - content += run_cmd - content += run_suffix + + if len(scenarios) > 1: + content += f"{cur_space3}=== \"All Scenarios\"\n{cur_space4}###### All Scenarios\n\n" + run_cmd = mlperf_inference_run_command(spaces+21, model, implementation, framework.lower(), category.lower(), "All Scenarios", device.lower(), final_run_mode, test_query_count, False, skip_test_query_count, scenarios, code_version, extra_variation_tags, extra_input_string) + content += run_cmd + content += run_suffix - readme_prefix = get_readme_prefix(spaces, model, implementation) + readme_prefix = get_readme_prefix(spaces, model, implementation, extra_variation_tags) - readme_suffix = get_readme_suffix(spaces, model, implementation) + readme_suffix = get_readme_suffix(spaces, model, implementation, extra_variation_tags) return readme_prefix + content + readme_suffix @@ -223,10 +251,10 @@ def get_test_query_count(model, implementation, device, num_devices=1): elif model in [ "retinanet", "bert-99", "bert-99.9" ]: p_range = 100 else: - p_range = 50 + p_range = 10 if device == "cuda": - p_range *= 40 + p_range *= 5 p_range *= num_devices return p_range @@ -280,15 +308,6 @@ def get_min_system_requirements(spaces, model, implementation, device): min_sys_req_content += f"{spaces}\n" return min_sys_req_content - def get_readme_prefix(spaces, model, implementation): - readme_prefix = "" - pre_space=" " - #for i in range(1,spaces): - # pre_space = pre_space + " " - #pre_space += " " - - return readme_prefix - def get_inference_server_run_cmd(spaces, implementation): indent = " "*spaces + " " if implementation == "neuralmagic": @@ -309,26 +328,50 @@ def get_venv_command(spaces): {pre_space}export CM_SCRIPT_EXTRA_CMD=\"--adr.python.name=mlperf\" {pre_space}```\n""" - def get_docker_info(spaces, model, implementation, device): + def get_docker_info(spaces, model, implementation, device, setup_tips=True): info = "" pre_space="" for i in range(1,spaces): pre_space = pre_space + " " pre_space += " " #pre_space = " " - if implementation == "nvidia": + if setup_tips: info += f"\n{pre_space}!!! tip\n\n" - info+= f"{pre_space} If ran with `--all_models=yes`, all the benchmark models of NVIDIA implementation could be run within the same container.\n\n" + + if model == "sdxl": + info+= f"{pre_space} - `--env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes` option can be used to download the model on the host so that it can be reused across different container lanuches. \n\n" + + info+= f"{pre_space} - Batch size could be adjusted using `--batch_size=#`, where `#` is the desired batch size. This option works only if the implementation in use is supporting the given batch size.\n\n" + if implementation.lower() == "nvidia": + info+= f"{pre_space} - Default batch size is assigned based on [GPU memory](https://github.com/mlcommons/cm4mlops/blob/dd0c35856969c68945524d5c80414c615f5fe42c/script/app-mlperf-inference-nvidia/_cm.yaml#L1129) or the [specified GPU](https://github.com/mlcommons/cm4mlops/blob/dd0c35856969c68945524d5c80414c615f5fe42c/script/app-mlperf-inference-nvidia/_cm.yaml#L1370). Please click more option for *docker launch* or *run command* to see how to specify the GPU name.\n\n" + info+= f"{pre_space} - When run with `--all_models=yes`, all the benchmark models of NVIDIA implementation can be executed within the same container.\n\n" + if "llama2" in model.lower(): + info+= f"{pre_space} - The dataset for NVIDIA's implementation of Llama2 is not publicly available. The user must fill [this](https://docs.google.com/forms/d/e/1FAIpQLSc_8VIvRmXM3I8KQaYnKf7gy27Z63BBoI_I1u02f4lw6rBp3g/viewform?pli=1&fbzx=-8842630989397184967) form and be verified as a MLCommons member to access the dataset.\n\n" + info+= f"{pre_space} - `PATH_TO_PICKE_FILE` should be replaced with path to the downloaded pickle file.\n\n" + else: + if model == "sdxl": + info += f"\n{pre_space}!!! tip\n\n" + info+= f"{pre_space} - `--env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes` option can be used to download the model on the host so that it can be reused across different container lanuches. \n\n" + return info - def get_readme_suffix(spaces, model, implementation): + def get_readme_prefix(spaces, model, implementation, extra_variation_tags): + readme_prefix = "" + pre_space=" " + #for i in range(1,spaces): + # pre_space = pre_space + " " + #pre_space += " " + + return readme_prefix + + def get_readme_suffix(spaces, model, implementation, extra_variation_tags): readme_suffix = "" pre_space="" for i in range(1,spaces): pre_space = pre_space + " " pre_space += " " - if implementation == "reference": + if implementation == "reference" and not extra_variation_tags: if not model.endswith("-99"): model_base_name = model.replace("-99.9","").replace("-99","") readme_suffix+= f"{pre_space}* If you want to download the official MLPerf model and dataset for {model} you can follow [this README](get-{model_base_name}-data.md).\n" @@ -336,22 +379,28 @@ def get_readme_suffix(spaces, model, implementation): readme_suffix += f"{pre_space}* Please see [mobilenets.md](mobilenets.md) for running mobilenet models for Image Classification." return readme_suffix - def get_run_cmd_extra(f_pre_space, model, implementation, device, scenario, scenarios = []): + def get_run_cmd_extra(f_pre_space, model, implementation, device, scenario, scenarios = [], run_tips=True, extra_input_string=""): extra_content = "" f_pre_space += "" if scenario == "Server" or (scenario == "All Scenarios" and "Server" in scenarios): extra_content += f"{f_pre_space} * `` must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.\n" - - if "gptj" in model and device == "cuda" and implementation == "reference": - extra_content += f"{f_pre_space} * `--precision=[float16|bfloat16]` can help run on GPUs with less RAM \n" + if implementation == "reference" and model in [ "sdxl", "gptj-99", "gptj-99.9" ] and device in ["cuda", "rocm"] and "precision" not in extra_input_string: + extra_content += f"{f_pre_space} * `--precision=float16` can help run on GPUs with less RAM / gives better performance \n" + if implementation == "reference" and model in [ "sdxl", "gptj-99", "gptj-99.9" ] and device in ["cpu"] and "precision" not in extra_input_string: + extra_content += f"{f_pre_space} * `--precision=bfloat16` can give better performance \n" + if "gptj" in model and implementation == "reference": extra_content += f"{f_pre_space} * `--beam-size=1` Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory\n" if extra_content: extra_content = f"{f_pre_space}!!! tip\n\n" + extra_content - return extra_content + if run_tips: + return extra_content + else: + return "" + @env.macro - def mlperf_inference_run_command(spaces, model, implementation, framework, category, scenario, device="cpu", execution_mode="test", test_query_count="20", docker=False, scenarios = [], code_version="r4.1-dev"): + def mlperf_inference_run_command(spaces, model, implementation, framework, category, scenario, device="cpu", execution_mode="test", test_query_count="20", docker=False, skip_test_query_count=False, scenarios = [], code_version="r4.1-dev", extra_variation_tags="", extra_input_string="", extra_docker_input_string=""): pre_space = "" for i in range(1,spaces): pre_space = pre_space + " " @@ -368,18 +417,20 @@ def mlperf_inference_run_command(spaces, model, implementation, framework, categ if scenario == "Server" or (scenario == "All Scenarios" and "Server" in scenarios): scenario_option += f"\\\n{pre_space} --server_target_qps=" - run_cmd_extra = get_run_cmd_extra(f_pre_space, model, implementation, device, scenario, scenarios) + run_cmd_extra = get_run_cmd_extra(f_pre_space, model, implementation, device, scenario, scenarios, True, extra_input_string) if docker: docker_cmd_suffix = f" \\\n{pre_space} --docker --quiet" - docker_cmd_suffix += f" \\\n{pre_space} --test_query_count={test_query_count}" - + if not skip_test_query_count: + docker_cmd_suffix += f" \\\n{pre_space} --test_query_count={test_query_count}" + if extra_docker_input_string != "" or extra_input_string != "": + docker_cmd_suffix += f" \\\n{pre_space} {extra_docker_input_string} {extra_input_string}" if "bert" in model.lower() and framework == "deepsparse": docker_cmd_suffix += f"\\\n{pre_space} --env.CM_MLPERF_NEURALMAGIC_MODEL_ZOO_STUB=zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base_quant-none" if "llama2-70b" in model.lower(): if implementation == "nvidia": docker_cmd_suffix += f" \\\n{pre_space} --tp_size=2" - docker_cmd_suffix += f" \\\n{pre_space} --nvidia_llama2_dataset_file_path=" + docker_cmd_suffix += f" \\\n{pre_space} --nvidia_llama2_dataset_file_path=" elif implementation == "neuralmagic": docker_cmd_suffix += f" \\\n{pre_space} --api_server=http://localhost:8000" docker_cmd_suffix += f" \\\n{pre_space} --vllm_model_name=nm-testing/Llama-2-70b-chat-hf-FP8" @@ -388,9 +439,14 @@ def mlperf_inference_run_command(spaces, model, implementation, framework, categ if "dlrm-v2" in model.lower() and implementation == "nvidia": docker_cmd_suffix += f" \\\n{pre_space} --criteo_day23_raw_data_path=" + if "short" in extra_variation_tags: + full_ds_needed_tag = "" + else: + full_ds_needed_tag = ",_full" + docker_setup_cmd = f"""\n {f_pre_space}```bash -{f_pre_space}cm run script --tags=run-mlperf,inference,_find-performance,_full,_{code_version}{scenario_variation_tag} \\ +{f_pre_space}cm run script --tags=run-mlperf,inference,_find-performance,{full_ds_needed_tag}_{code_version}{scenario_variation_tag}{extra_variation_tags} \\ {pre_space} --model={model} \\ {pre_space} --implementation={implementation} \\ {pre_space} --framework={framework} \\ @@ -402,9 +458,9 @@ def mlperf_inference_run_command(spaces, model, implementation, framework, categ return docker_setup_cmd + run_cmd_extra else: - cmd_suffix = f"\\\n{pre_space} --quiet" + cmd_suffix = f"\\\n{pre_space} --quiet {extra_input_string}" - if execution_mode == "test": + if execution_mode == "test" and not skip_test_query_count: cmd_suffix += f" \\\n {pre_space} --test_query_count={test_query_count}" if "bert" in model.lower() and framework == "deepsparse": @@ -423,7 +479,7 @@ def mlperf_inference_run_command(spaces, model, implementation, framework, categ run_cmd = f"""\n {f_pre_space}```bash -{f_pre_space}cm run script --tags=run-mlperf,inference,_{code_version}{scenario_variation_tag} \\ +{f_pre_space}cm run script --tags=run-mlperf,inference,_{code_version}{scenario_variation_tag}{extra_variation_tags} \\ {pre_space} --model={model} \\ {pre_space} --implementation={implementation} \\ {pre_space} --framework={framework} \\ diff --git a/mkdocs.yml b/mkdocs.yml index 8e59acf63..95dfb6e86 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -19,25 +19,31 @@ theme: - navigation.top - toc.follow nav: - - Install: - - install/index.md - - Benchmarks: + - Home: - index.md - Image Classification: - ResNet50: benchmarks/image_classification/resnet50.md - Text to Image: - - Stable Diffusion: benchmarks/text_to_image/sdxl.md + - Stable Diffusion: + - Run Commands: benchmarks/text_to_image/sdxl.md + - Reproducibility: + - SCC24: benchmarks/text_to_image/reproducibility/scc24.md - Object Detection: - RetinaNet: benchmarks/object_detection/retinanet.md - Medical Imaging: - 3d-unet: benchmarks/medical_imaging/3d-unet.md - Language Processing: - - Bert-Large: benchmarks/language/bert.md + - Bert-Large: + - Run Commands: benchmarks/language/bert.md + - Reproducibility: + - IndySCC24: benchmarks/language/reproducibility/indyscc24-bert.md - GPT-J: benchmarks/language/gpt-j.md - LLAMA2-70B: benchmarks/language/llama2-70b.md - MIXTRAL-8x7B: benchmarks/language/mixtral-8x7b.md - Recommendation: - DLRM-v2: benchmarks/recommendation/dlrm-v2.md + - Install CM: + - install/index.md - Submission: - Submission Generation: submission/index.md - Release Notes: @@ -62,3 +68,7 @@ markdown_extensions: plugins: - search - macros + - site-urls + - redirects: + redirect_maps: + 'benchmarks/index.md': 'index.md' diff --git a/text_to_image/main.py b/text_to_image/main.py index 3d81b3fb0..07cf17472 100644 --- a/text_to_image/main.py +++ b/text_to_image/main.py @@ -111,7 +111,7 @@ def get_args(): parser.add_argument( "--device", default="cuda", - choices=["cuda", "cpu"], + choices=["cuda", "cpu", "rocm"], help="device to run the benchmark", ) parser.add_argument(