-
Notifications
You must be signed in to change notification settings - Fork 21
cuda
Mark Gates edited this page Jul 13, 2023
·
1 revision
CUDA Latency
Benchmarks on Leconte, with 8 V100 GPUs
Function | time | notes | |
---|---|---|---|
is_device_ptr | 1.36e-07 s | 136 ns | avg for 1e6 lookups |
set_device | 7.3e-08 s | 73 ns | avg for 1e7 iters, 8 dev |
create cuda stream | 0.000025 s | 25,000 ns | avg for 10 iters * 8 dev, excluding max |
destroy stream | 0.000004 s | 4,000 ns | avg for 10 iters * 8 dev, excluding max |
create cublas handle | 0.000280 s | 280,000 ns | avg for 10 iters * 8 dev, excluding max |
destroy handle | 0.000261 s | 280,000 ns | avg for 10 iters * 8 dev, excluding max |
get/set pointer mode | 6.14e-09 s | 6 ns | avg for 1e6 iters |