Skip to content
Mark Gates edited this page Jul 13, 2023 · 1 revision

CUDA Latency

Benchmarks on Leconte, with 8 V100 GPUs

Function time notes
is_device_ptr 1.36e-07 s 136 ns avg for 1e6 lookups
set_device 7.3e-08 s 73 ns avg for 1e7 iters, 8 dev
create cuda stream 0.000025 s 25,000 ns avg for 10 iters * 8 dev, excluding max
destroy stream 0.000004 s 4,000 ns avg for 10 iters * 8 dev, excluding max
create cublas handle 0.000280 s 280,000 ns avg for 10 iters * 8 dev, excluding max
destroy handle 0.000261 s 280,000 ns avg for 10 iters * 8 dev, excluding max
get/set pointer mode 6.14e-09 s 6 ns avg for 1e6 iters
Clone this wiki locally