Add a section on reproducibility to the docs #61

hagenw · 2023-03-20T12:36:38Z

The results you get back when running a model can depend on the device, and can even vary across several calls on the same device. It might be a good idea to add a "Reproducibility" section to the documentation in which we discuss these issues.

For example, let us use the model introduced in w2v2-how-to:

import audeer
import audonnx
import numpy as np


url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')

archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)

np.random.seed(1)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)

Now, let us execute the model on the CPU:

>>> model = audonnx.load(model_root, device='cpu')
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)

When using the CPU we always get back the same result,
when executing it multiple times.

Then let's switch to the GPU:

>>> model = audonnx.load(model_root, device='cuda:0')
>>> model(signal, sampling_rate)['logits']
array([[0.68319285, 0.64667934, 0.49738473]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.68317926, 0.6466613 , 0.4974225 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.683162  , 0.64668435, 0.4973961 ]], dtype=float32)

We see that we get different results after the fifth decimal place for each run,
and the average result deviates from the CPU based result by:

array([[-2.62856483e-05, -5.79953194e-05, -1.06304884e-04]], dtype=float32)

This is a known ONNX limitation (microsoft/onnxruntime#9704).
In microsoft/onnxruntime#4611 (comment) they propose to select a fixed convolution algorithm to improve this behavior, see also https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking.
With audonnx we can achieve this by

>>> providers = [("CUDAExecutionProvider", {'cudnn_conv_algo_search': 'DEFAULT'})]
>>> model = audonnx.load(model_root, device=providers)
>>> model(signal, sampling_rate)['logits']
array([[0.683191  , 0.64670646, 0.4973919 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6830938 , 0.6466217 , 0.49734592]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6831656 , 0.64666504, 0.497427  ]], dtype=float32)

It does not really improve results.

It seems that we can only recommend the following when reproducibility is desired:

use CPU as device
limit the outcome of the model to two decimal places, e.g. array([[0.68, 0.65, 0.50]], dtype=float32)

/cc @audeerington

The text was updated successfully, but these errors were encountered:

hagenw · 2023-03-20T12:48:42Z

When the output of the model is a class label and not a float value, I guess there is no way to ensure that results are completely reproducible when running of the GPU as we can not limit the precision at the end of the operation and it might be that a database contains some corner cases where we see a class flip when executing again on the GPU.

hagenw · 2023-03-21T07:18:17Z

The same problem we have for regression values. Even if we round to two decimal places, there will always be a few boundary cases for which one model returns e.g. 0.03 and the other 0.02.

It seems very unfortunate, but the only way to achieve reproducibility when running a model the second time or on different machines seems to be to not run it on the GPU.

hagenw · 2023-03-21T07:20:00Z

From https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

How do we also achieve reproducibility on GPU? In short, one should not expect full reproducibility across different hardware when running pipelines on GPU as matrix multiplications are less deterministic on GPU than on CPU

hagenw added the documentation Improvements or additions to documentation label Mar 20, 2023

huchenlei mentioned this issue Aug 7, 2023

[Bug] Onnx runtime branch generates non-deterministic output IDEA-Research/DWPose#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a section on reproducibility to the docs #61

Add a section on reproducibility to the docs #61

hagenw commented Mar 20, 2023 •

edited

Loading

hagenw commented Mar 20, 2023

hagenw commented Mar 21, 2023

hagenw commented Mar 21, 2023

Add a section on reproducibility to the docs #61

Add a section on reproducibility to the docs #61

Comments

hagenw commented Mar 20, 2023 • edited Loading

hagenw commented Mar 20, 2023

hagenw commented Mar 21, 2023

hagenw commented Mar 21, 2023

hagenw commented Mar 20, 2023 •

edited

Loading