-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
layers: Allow using uncached buffer for Debug Printf #6128
layers: Allow using uncached buffer for Debug Printf #6128
Conversation
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
749483d
to
af6ca10
Compare
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
af6ca10
to
d148df2
Compare
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
1 similar comment
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
2a5f0a9
to
ada331e
Compare
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
CI Vulkan-ValidationLayers build queued with queue ID 5177. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been tested? Is there a test that can go with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any testing being added. Is the plan to add them later?
CI Vulkan-ValidationLayers build # 12676 running. |
CI Vulkan-ValidationLayers build # 12676 passed. |
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
@juan-lunarg @Tony-LunarG
I've only tested it manually so far. |
CI Vulkan-ValidationLayers build queued with queue ID 5894. |
CI Vulkan-ValidationLayers build # 12692 running. |
I think so, as long as the test gets a DEVICE_LOST and then successfully parses a debug printf |
CI Vulkan-ValidationLayers build # 12692 passed. |
This sounds like it would cause problems for Jenkins CI (Although if we switched to ctest maybe not... since the test would be in it's own process). Because currently CI invokes the test executable and tries running all tests. |
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
Adds option to force using AMD_DEVICE_COHERENT_MEMORY for debug printf buffer to print messages even if VK_ERROR_DEVICE_LOST is encountered. * Option added to layer json to be visible in vkconfig, * Forcing extension and device feature if not enabled by application, * Added workaround for atomic operations in uncached memory being in cache anyway, * Added workaround to failing MapMemory after DEVICE_LOST (occurs on AMD): When using uncached buffer, do not unmap buffer until messages are analyzed.
99bb797
to
090d262
Compare
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
@Tony-LunarG @juan-lunarg @jeremyg-lunarg
I plan to add also unit test for Page Fault, but preferably in another pull request. |
Author dorian-apanel-intel not on autobuild list. Waiting for curator authorization before starting CI build. |
CI Vulkan-ValidationLayers build queued with queue ID 7794. |
CI Vulkan-ValidationLayers build # 12728 running. |
1 similar comment
CI Vulkan-ValidationLayers build # 12728 running. |
CI Vulkan-ValidationLayers build # 12728 failed. |
Galaxy S22 testing failed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the test 👍🏾
I'll defer to @Tony-LunarG and @jeremyg-lunarg for the actual review.
I don't think there's much the test can do about that. I'll just add the new test to the S22 blacklist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message processing loop won't exit correctly
|
||
uint32_t index = spvtools::kDebugOutputDataOffset; | ||
while (debug_output_buffer[index]) { | ||
while ((index < output_buffer_size) && debug_output_buffer[index]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this loop will work if the output buffer has one message but isn't full. To see this, use uncached buffer memory in the test NegativeDebugPrintf.BasicUsage and run it. This loop doesn't stop when it's supposed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check in which vkQueueSubmit/AnalyzeAndGenerateMessages you observe it? BasicUsage does it 15 times.
Here is how I understand it:
In my first one, debug_output_buffer == 0x0000028b8e520000
Memory of that buffer is:
0x0000028B8E520000 00 00 00 00 MBZ
0x0000028B8E520004 0a 00 00 00 expect (size of all messages, written by atomic in shader)
0x0000028B8E520008 0a 00 00 00 DPFOutputRecord.size (size in dwords of this message)
0x0000028B8E52000C 00 00 00 00 DPFOutputRecord.shader_id
0x0000028B8E520010 70 00 00 00 DPFOutputRecord.instruction_position
0x0000028B8E520014 00 00 00 00 DPFOutputRecord.stage
0x0000028B8E520018 00 00 00 00 DPFOutputRecord.stage_word_1
0x0000028B8E52001C 00 00 00 00 DPFOutputRecord.stage_word_2
0x0000028B8E520020 00 00 00 00 DPFOutputRecord.stage_word_3
0x0000028B8E520024 29 00 00 00 DPFOutputRecord.format_string_id ("Here are two float values %f, %f")
0x0000028B8E520028 00 00 80 3f DPFOutputRecord.values (float 1.0)
0x0000028B8E52002C 56 0e 49 40 DPFOutputRecord.values (float 3.14150000)
0x0000028B8E520030 00 00 00 00 DPFOutputRecord.size (no more messages)
0x0000028B8E520034 00 00 00 00 DPFOutputRecord.
0x0000028B8E520038 00 00 00 00 DPFOutputRecord.
Index starts at 0x2. At the end of first loop iteration:
index += debug_record->size; // 0x2 + 0xa = 0xc
At second loop beginning
debug_output_buffer[index]
== 0x0000028b8e520030 {0x0}
So the loop ends.
(index < output_buffer_size)
means "while we are still in the buffer"
&& debug_output_buffer[index])
means "while next message is not empty"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have access to the system I was using to test this today - I'll get back to it tomorrow. The system I do have is getting VK_TIMEOUT instead of VK_ERROR_DEVICE_LOST or VK_SUCCESS at the vkQueueWaitIdle, and I had to add that to the test's expected results. You should add that to your test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've added VK_TIMEOUT to asserts in test.
I assume you used AMD for that test. it happened to me once too.
VK_TIMEOUT should be not returned from vkQueueSubmit, vkDeviceWaitIdle or vkQueueWaitIdle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the problem. Because debug_output_buffer[1] comes back as 0, the memset at the end of message processing doesn't clear the buffer:
memset(debug_output_buffer, 0, 4 * (debug_output_buffer[spvtools::kDebugOutputSizeOffset] + spvtools::kDebugOutputDataOffset));
And the memset clears 4 * (0 + 2) bytes which doesn't cover all of the records. Then on the next printf, the loop depends on debug_output_buffer[index]
being zero when the loop is supposed to stop, but previous printfs may have left values that didn't get memset to 0. I suppose one answer would be to memset the whole buffer.
But I'm seeing another problem. If I modify your test to do two debug printfs:
if (gl_VertexIndex == 0) {
debugPrintfEXT("Here are three float values %f, %f, %f", 1.0, myfloat, gl_Position.x);
debugPrintfEXT("Here's another debug printf");
float x = constants.x[0];
while(x > -1.f) { // infinite loop
x += 0.000001f;
}
debugPrintfEXT("Here is a value that should not be printed %f", x);
}
I only get the second printf back, as if the first one was overwritten in the shader. Could that be a result of the atomic operation problem and debug_output_buffer[1] not getting written?
(This is all on an AMD RX6600 with driver 23.7.1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The atomic add is performed by the instrumentation added to the shader via spirv-tools code, which I'm not very familiar with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so after a bit of digging, this extension was enabled before we had the MALL cache in hardware, and nobody ever went back and changed the interface to bypass MALL, so there IS caching enabled on this memory type for newer hardware. Whoops. I've filed an internal bug to fix this and to add a more comprehensive test to make sure this actually works on the off chance we make another big change to the caching hierarchy...
You can go look at the public driver source if for some reason you want to see where this is happening 🙃
I'll let you know when it's fixed and subsequently released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tobski I assume the fix is released now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fix has been implemented but I'm not sure on release status, I'll take a look.
CI Vulkan-ValidationLayers build queued with queue ID 9060. |
CI Vulkan-ValidationLayers build # 12764 running. |
CI Vulkan-ValidationLayers build # 12764 passed. |
CI Vulkan-ValidationLayers build queued with queue ID 10076. |
CI Vulkan-ValidationLayers build # 12781 running. |
CI Vulkan-ValidationLayers build # 12781 failed. |
CI Vulkan-ValidationLayers build queued with queue ID 10420. |
Last LunarG CI failure was due to a system issue. I've restarted. |
CI Vulkan-ValidationLayers build # 12787 running. |
CI Vulkan-ValidationLayers build # 12787 passed. |
@@ -672,6 +821,9 @@ void DebugPrintf::AllocateDebugPrintfResources(const VkCommandBuffer cmd_buffer, | |||
buffer_info.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT; | |||
VmaAllocationCreateInfo alloc_info = {}; | |||
alloc_info.requiredFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT; | |||
if (use_uncached_buffer) { | |||
alloc_info.requiredFlags |= VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want to require UNCACHED here, since theoretically in future you might get COHERENT and CACHED memory. Uncached is meant to be informative, not something you generally rely upon unless you're doing some very specific performance tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand. Here I want to get device uncached to read it after device is lost.
this has got stale - closing, but tracking here (#9211) so can reference this in future |
Adds option to force using AMD_DEVICE_COHERENT_MEMORY for debug printf buffer to print messages even if VK_ERROR_DEVICE_LOST is encountered.
Forcing extension and device feature if not enabled by application.
Added workaround for atomic operations in
uncached memory being in cache anyway.
Added workaround to failing MapMemory after DEVICE_LOST (occurs on AMD): When using uncached buffer, do not unmap buffer until messages are analyzed.
This PR attempts to implement #6101 Tested on AMD and Intel (extension not public yet) using this modified vkcube: dorian-apanel-intel/Vulkan-Tools@e022693