-
Notifications
You must be signed in to change notification settings - Fork 3
Compiling Kernel Codes
Compiling takes kernel code as first parameter and kernel name as second parameter.
In kernel, get_global_id(0)
returns continuous id values between GPUs/CPU such that all id values for a run are unique (zero-based, 1-dimensional, up to N-1). If first device computes 1/3 of 768 global threads, then the second device's work-items start with id value of 256 while first device computes id values of 0 to 255.
computer.compile(std::string(R"(
void kernel add(global int * a, global int * b)
{
// global id of this thread
const int id = get_global_id(0);
b[i] = a[i] + 1;
}
)"), "add");
During compilation, all devices are prepared serially to maintain thread-safety for OpenCL runtime.
Currently same code with same optimization parameters (fused-multiply-add enabled) used for all devices but each device's code is compiled with its' own driver so they are optimized enough currently.
Every device has its own compilation unit so any state change outside of kernel is not visible on other devices. Any communication has to pass through parameters.