We can run tests on the same GPU: once using NVIDIA® CUDA™ native, and once using Coriander, and compare the execution times
Scenario | Coriander | NVIDIA® CUDA™ native |
---|---|---|
linear_regression.py | 0.21s | 0.07s |
logistic_regression.py | 9.5s | 3.7s |
multilayer_perceptron.py | 15.8s | 15.1s |
recurrent_network.py | 0.84s | 0.23s |
dynamic_rnn.py | 0.9s | 0.23s |
bidirectional_rnn.py | 0.9s | 0.24s |
We can see that:
- for multilayer_perceptron.py, epoch time is comparable between Coriander and NVIDIA® CUDA™, using the same GPU
- for the recurrent networks, Coriander is around 4 times slower than using NVIDIA® CUDA™ directly.
Coriander writes the kernels to OpenCL at runtime, and compiles them on-the-fly. This means the first iteration will take longer. Here is the increase in execution time for the first iteration:
Scenario | Kernel generation/compile time |
---|---|
linear_regression.py | 0.13s |
logistic_regression.py | 0.9s |
multilayer_perceptron.py | ~0s |
recurrent_network.py | 1.9s |
dynamic_rnn.py | 3.7s |
bidirectional_rnn.py | 2.1s |