How to reuse copies of a buffer on a device? #3790
Unanswered
JOOpdenhoevel
asked this question in
Q&A
Replies: 1 comment
-
It's hard to tell anything specific without seeing the source code. But I wouldn't expect runtime to copy data back to host on its own. There're four cases when it may happen:
If neither of that happens, the data should stay on device. You can use SYCL_PRINT_EXECUTION_GRAPH env variable. It'll dump the execution graph to your working dir in dot format. It can be converted with graphviz or a similar tool to a graphical format. Data copies have their own nodes. This may help you diagnose your particular problem. If you need manual control over memory, take a look at USM. Also tagging @romanovvlad |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am currently working on a project that works with generations of a buffer. For example, our kernel reads from buffer A and writes the resulting next generation to buffer B. After that, the kernel is invoked again to read from buffer B and write to buffer C, and so on. These invocations all happen in one queue in one context and in theory, it should be easy to just leave buffer B on the device and then reuse this copy for the next invocation. However profiling seems to indicate that the SYCL runtime copies B to the host first, allocates a new memory object on the device and the copies the data back to invoke the kernel on this new memory object.
First of all: Is this actually the case? Does the SYCL runtime work like that? I can understand why this would have been implemented like that, but in our application it is essential to our performance that the runtime keeps track of where the current version of a buffer is located and that it reuses this copy, if possible. If this isn't the standard behavior, is there a way to achieve this? One opportunity would be to use the OpenCL interoperability layer for direct control over the allocated memory objects, but as far as I own it's already deprecated, isn't it? Would it be possible to get this feature properly implemented in the runtime?
Beta Was this translation helpful? Give feedback.
All reactions