-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP cl_ext_image_tiling_control #710
base: main
Are you sure you want to change the base?
Conversation
cc5e8ec
to
69ce1e4
Compare
fdabb60
to
938e085
Compare
Discussed in the October 4th teleconference:
|
Some initial thoughts
Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One high-level thought: Especially because this extension leaves "optimal tiling" implementation-defined (this is a good thing), should the controls be a bit more descriptive and a bit less prescriptive?
Said another way, I think some implementations could benefit from this extension even if they don't support "tiling", by assuming that a "linear" image should be optimized for host access and an "optimal" image should be optimized for device access.
One change we could consider is to switch from CL_IMAGE_TILING_{LINEAR|OPTIMAL}
to CL_IMAGE_LAYOUT_{LINEAR|OPTIMAL}
(so tiling -> layout) to be a little more general.
Or, if we wanted to go a bit further, we could switch to something like CL_IMAGE_LAYOUT_{DYNAMIC|STATIC}
, or even CL_IMAGE_LAYOUT_{HOST|DEVICE}_OPTIMAL
.
What do you think?
If {CL_MEM_IMAGE_TILING_EXT} is passed and set to a value different from | ||
{CL_IMAGE_TILING_LINEAR_EXT}, {CL_MEM_USE_HOST_PTR} must not be set in _mem_flags_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It might be better to prohibit the CL_MEM_USE_HOST_PTR
case specifically for CL_IMAGE_LAYOUT_OPTIMAL
vs. anything that is NOT CL_IMAGE_TILING_LINEAR
. It would certainly require documenting the tile format, but conceivably some specific tiling formats could work with CL_MEM_USE_HOST_PTR
if the host knows how to interpret and de-tile the memory.
Then again, couldn't any tiled memory be de-tiled when it is mapped into the provided host pointer, then re-tiled when it is unmapped?
I'm not sure if I have a strong use-case that would require changing this behavior but it would be helpful to understand where it is coming from.
| Return the tiling passed using the {CL_MEM_IMAGE_TILING_EXT} property or | ||
{CL_IMAGE_TILING_LINEAR_EXT} if none was passed at image creation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an obvious solution to this problem, but I think it will be misleading if this query returns CL_IMAGE_TILING_LINEAR
if no property were specified at image creation time. If we implemented this extension then some of our devices would almost certainly behave differently if an application passed LINEAR
explicitly vs. no property.
I suppose one solution to consider is to define a third DEFAULT
layout type in this extension that is the equivalent of "no property specified". Some implementations might treat this the same as LINEAR
or OPTIMAL
, but some other implementations may not.
*RESOLVED*: An application wishing to get visibility of tiled data can do so by | ||
creating images from a buffer and mapping the buffer directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered defining a map flag to indicate "leave the memory as-is" vs. the current "convert to a linear layout"? I admittely haven't fully thought this through or even if this would be useful, but the disadvantage of the "create an image from a buffer" method is that it won't work after-the-fact if an image is already created, whereas a map flag might still work.
I wanted to leave some high-level thoughts about how this extension could work. The major impact of linear vs optimal tiling modes would be when the image is created from a buffer or from external memory. In these cases, if the image has optimal tiling mode, then the application should be required to set row_pitch and/or slice_pitch to zero. Applications should be able to choose whether an optimally tiled image should present a linear view when mapped for host CPU access. Currently, implementations are implicitly required to present a linear view of image data when mapped for host access. This extension could possibly give implementations the ability to opt out of providing that linear view. How the image data is finally presented when mapped for host access would then depend on both implementation capabilities and application selection. Ben has suggested that applications make the linear view vs. raw view selection as part of the EnqueueMap command. This could also be specified when creating the image. We'll need to discussion these options further. Thanks, |
I think the comment above describes one way this extension could work, but not necessarily the only way. IMHO the key value this extension provides is a hint (or assertion?) to the driver that the image should (or must?) be stored internally in a specific format - either "linear" (well-specified) or "tiled" (implementation-defined, in this extension, at least). This will primarily change the performance characteristics of an image, especially when it is mapped, though likely also when it is accessed within a kernel. Once we have the ability to control or otherwise influence the internal layout it gives us the ability to add additional functionality, either in this extension or in other layered extensions. For example, we could:
One of the key questions I think we'll need to answer is whether this extension is purely a performance hint or whether it is something stronger. For the base-level functionality I think we could go either way, but for the additional functionality (external memory and direct access mapping) I think we'd want stronger guarantees (at least in some cases). |
to *clCreateImageWithProperties* to control the tiling used for the image being | ||
created. The following values are accepted: | ||
|
||
* {CL_IMAGE_TILING_LINEAR_EXT}, which is the default value used if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For implementations that don't/won't support cl_ext_image_tiling_control, the property CL_MEM_IMAGE_TILING_EXT will be missing and the default implementation choice of clCreateImageWithProperties may not necessarily be linear tiling.
For such implementations, the requirement of CL_IMAGE_TILING_LINEAR_EXT being the default will mean that if they decide to support this extension, clCreateImageWithProperties behavior will need to break backwards compatibility and all existing implementations will need to explicitly pass CL_IMAGE_TILING_OPTIMAL_EXT or some other flag to get the same layout as before. Otherwise, they will end up forcing linear layout and are likely to incur performance penalty. This would be painful and extremely undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, and I had a similar comment above.
One possible way to solve this problem is to have three "tiling" states:
- An explicitly linear image.
- An explicitly tiled image.
- A "default" or "unspecified" layout where the implementation can choose whether the image is linear or tiled or something else.
(1) and (2) are already defined by this extension. (3) would need to be added. If we added (3) and made it the default value I think it would retain backwards compatibility and avoid any performance penalties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the January 3rd teleconference. Another possibility is to only have two tiling states, but to give freedom to implementations to use linear tiling even for "tiling optimal" images if it chooses to do so. If we did this then the default tiling property could be "optimal", which would also avoid the possible performance penalty.
This isn't a perfect solution. A case where this is insufficient is an external memory case, where the exporting API (e.g. Vulkan) uses a different policy for "tiling optimal" images than the importing API (e.g. OpenCL).
A more robust solution is to specify the explicit type of tiling, akin to VK_EXT_image_drm_format_modifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both having a separate default or optimal covering non-linear as well as linear tiling modes should work.
Default may be unnecessary if optimal allows both.
In case of default or optimal tiling mode, if implementation is free to choose optimal or linear mode, what should clGetImageInfo CL_IMAGE_TILING_EXT query return? Should it return default/optimal tiling/layout or should it return optimal / linear as chosen by implementation?
I posted some comments on #861 which include updates to cl_ext_image_tiling_control. These comments also discuss how cl_ext_image_tiling_control could bring clarity to external memory and image_from_buffer usage. Thanks, |
Following up on the discussion around #861 from Jan 10, 2023 We'd like to propose the following outline for his extension. First we recommend that this extension be made khr since it is foundational to other extensions including external memory. CL_IMAGE_TILING_LINEAR_KHR and CL_IMAGE_TILING_OPTIMAL_KHR These properties can be used when calling clCreateImageWithProperties and bring clarity to the image layout when used. Optimal tiling can be thought of as a superset of linear tiling. What it does it to let implementations decide how the image should be tiled. An implementation can continue to use linear tiling even if the application selects CL_IMAGE_TILING_OPTIMAL_KHR With that in mind, we believe all implementations should be able to support both linear and optimal tiling. We also recommend that all implementations be required to support device access to optimally tiled and linear images. It seems that device access is a minimum for the image to be useful. Regarding host access, we propose that the default behavior be to present a linear view of the image data when clEnqueueMapBuffer is called. This means that implementations will need to detile optimally tiled images ( if needed ) when clEnqueueMapBuffer is called. If this is problematic for some vendors, we can add a cl_device_info param that lets the application know if the implementation can detile optimally tiled images when clEnqueueMapBuffer is called. For many use cases having direct access to tiled image data is useful. For this reason, we are proposing a new flag CL_MAP_IMAGE_NO_DETILE_KHR. When used in clEnqueueMapImage, the implementation will not detile the data for host access. Thanks, |
938e085
to
e781646
Compare
I've rebased the specification draft and captured what I believe to be an exhaustive list of all the open topics in the PR description. We seem to be mostly aligned on where this extension should go. |
We uploaded an update draft that has the following major modifications.
Thanks, |
Change-Id: I6c391b5ecaa203f7db566db68a7e3d124d6038a2 Signed-off-by: Kevin Petit <[email protected]>
5d3d513
to
1e0203c
Compare
Change-Id: Id434000fdf949c76af92dff17829462e4bc9dd0f
1e0203c
to
d40b664
Compare
As agreed in recent teleconferences, I have now:
I have also fixed minor typos and issues. We should now be in a good position to resume discussions and enact outcomes. Doing this work made it pretty clear that we will need to integrate cl_ext_image_requirements_info and cl_ext_image_from_buffer to the unified specification before finalising cl_ext_image_tiling_control if we are to cleanly document interactions. I'll work on doing this in parallel to the discussions on cl_ext_image_tiling_control. |
<unused start="0" end="0"/> | ||
<enum value="1" name="CL_IMAGE_TILING_LINEAR_EXT"/> | ||
<enum value="2" name="CL_IMAGE_TILING_OPTIMAL_EXT"/> | ||
<unused start="3" end="0xFFFFFFFF"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this intentionally a separate value range, parallel to everything else?
{CL_DEVICE_IMAGE_TILING_HOST_ACCESS_EXT_anchor} is set when it is allowed | ||
to pass {CL_MEM_IMAGE_TILING_EXT} with a value different from | ||
{CL_IMAGE_TILING_LINEAR_EXT} in the _properties_ given to | ||
{clCreateImageWithProperties} when creating host-accessible images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? If you want raw host access to tiled data, you would create a buffer from the image. Wouldn't the buffer inherit the no host access flag? Couldn't the image be host-inaccessible for reasons other than tiling?
created. The following values are accepted: | ||
|
||
* {CL_IMAGE_TILING_LINEAR_EXT_anchor}, which is the default value used if | ||
{CL_MEM_IMAGE_TILING_EXT} is not specified in the _properties_. Image data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should me disallow the use if this propery when importing external memory? This may contradict an external import.
ifdef::cl_ext_image_tiling_control[] | ||
If the image object is created with {CL_MEM_IMAGE_TILING_EXT} set to a value | ||
different from {CL_IMAGE_TILING_LINEAR_EXT}, the value returned for | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an externally imported image is tiled, should we return optimal?
Open topics:
CL_MEM_USE_HOST_PTR
(see WIP cl_ext_image_tiling_control #710 (comment))DEFAULT
implementation-defined tiling (see WIP cl_ext_image_tiling_control #710 (comment) and WIP cl_ext_image_tiling_control #710 (comment))Change-Id: I6c391b5ecaa203f7db566db68a7e3d124d6038a2
Signed-off-by: Kevin Petit [email protected]