diff --git a/docs/tutorials/distrib-ml/torch_scaling_test.rst b/docs/tutorials/distrib-ml/torch_scaling_test.rst index 929847fe..243bb74f 100644 --- a/docs/tutorials/distrib-ml/torch_scaling_test.rst +++ b/docs/tutorials/distrib-ml/torch_scaling_test.rst @@ -6,6 +6,41 @@ PyTorch scaling test :end-before: Below follows an example of -Below follows an example of scalability plot generated by ``itwinai scalability-report``: +Plots of the scalability metrics +----------------------------- -.. image:: ../../../tutorials/distributed-ml/torch-scaling-test/img/report.png +We have the following scalability metrics available: + +- Absolute wall-clock time comparison +- Relative wall-clock time speedup +- Communication vs. Computation time +- GPU Utilization (%) +- Power Consumption (Watt) + +Some examples of these scalability metrics on the Virgo use case with one, two and four +nodes respectively can be seen below: + +Absolute Wall-Clock Time Comparison +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. image:: ../../../tutorials/distributed-ml/torch-scaling-test/img/absolute-time.png + +Relative Wall-Clock Time Speedup +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. image:: ../../../tutorials/distributed-ml/torch-scaling-test/img/relative-speedup.png + +Communication vs Computation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. image:: ../../../tutorials/distributed-ml/torch-scaling-test/img/comp-vs-comm.png + +GPU Utilization +~~~~~~~~~~~~~~~~ + +.. image:: ../../../tutorials/distributed-ml/torch-scaling-test/img/gpu-utilization.png + +Power Consumption +~~~~~~~~~~~~~~~~~ + +.. image:: ../../../tutorials/distributed-ml/torch-scaling-test/img/energy-consumption.png diff --git a/tutorials/distributed-ml/torch-scaling-test/README.md b/tutorials/distributed-ml/torch-scaling-test/README.md index 66dcfe4e..e5e310e3 100644 --- a/tutorials/distributed-ml/torch-scaling-test/README.md +++ b/tutorials/distributed-ml/torch-scaling-test/README.md @@ -137,7 +137,3 @@ To see the full list of possible arguments, type: ```bash itwinai generate-scalability-plot --help ``` - -Below follows an example of scalability plot generated by `itwinai scalability-report`: - -![report](img/report.png) diff --git a/tutorials/distributed-ml/torch-scaling-test/img/absolute-time.png b/tutorials/distributed-ml/torch-scaling-test/img/absolute-time.png new file mode 100644 index 00000000..692864d5 Binary files /dev/null and b/tutorials/distributed-ml/torch-scaling-test/img/absolute-time.png differ diff --git a/tutorials/distributed-ml/torch-scaling-test/img/comp-vs-comm.png b/tutorials/distributed-ml/torch-scaling-test/img/comp-vs-comm.png new file mode 100644 index 00000000..2e616666 Binary files /dev/null and b/tutorials/distributed-ml/torch-scaling-test/img/comp-vs-comm.png differ diff --git a/tutorials/distributed-ml/torch-scaling-test/img/energy-consumption.png b/tutorials/distributed-ml/torch-scaling-test/img/energy-consumption.png new file mode 100644 index 00000000..1faf7659 Binary files /dev/null and b/tutorials/distributed-ml/torch-scaling-test/img/energy-consumption.png differ diff --git a/tutorials/distributed-ml/torch-scaling-test/img/gpu-utilization.png b/tutorials/distributed-ml/torch-scaling-test/img/gpu-utilization.png new file mode 100644 index 00000000..d8f8bcbb Binary files /dev/null and b/tutorials/distributed-ml/torch-scaling-test/img/gpu-utilization.png differ diff --git a/tutorials/distributed-ml/torch-scaling-test/img/relative-speedup.png b/tutorials/distributed-ml/torch-scaling-test/img/relative-speedup.png new file mode 100644 index 00000000..d6a7ead7 Binary files /dev/null and b/tutorials/distributed-ml/torch-scaling-test/img/relative-speedup.png differ