-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support XFB in MoltenVK #2169
base: main
Are you sure you want to change the base?
Support XFB in MoltenVK #2169
Conversation
This is analogous to the existing support for fixing up shader inputs. It is intended to be used with tessellation to add implicit builtins that are read from a later stage, despite not being written in an earlier stage. (Believe it or not, this is in fact legal in Vulkan.) Helps fix 8 CTS tests under `dEQP-VK.pipeline.*.no_position`. (Eight other tests work solely by accident without this change.)
MSL: Add a mechanism to fix up shader outputs. Approved-by: Steven Winston
It is possible in SPIR-V to declare multiple specialization constants with the same constant ID. The most common cause of this in GLSL is defining a spec constant, then declaring the workgroup size to use that spec constant by its ID. But, MSL forbids defining multiple function constants with the same function constant ID. So, we must only emit one definition of the actual function constant (with the `[[function_constant(id)]]` attribute); but we can point the other variables at this one definition. Fixes three tests in the Vulkan CTS under `dEQP-VK.compute.basic.max_local_size_*`.
MSL: Deduplicate function constants. Approved-by: Steven Winston
Does analysis of outputs and sorts them into buffers. Nothing else yet.
# Conflicts: # main.cpp # spirv_cross_c.cpp # spirv_cross_c.h # spirv_msl.cpp # spirv_msl.hpp
This only does the bare minimum needed to write XFB data (and not even that actually). It still needs to calculate the offset in the buffer where the data need to be written, and primitive types other than points need to be implemented.
# Conflicts: # reference/shaders-msl/comp/local-size-duplicate-spec-id.comp # spirv_msl.cpp
…e updated to be used.
Given this is still marked draft, I assume this is still WIP and I'll be notified when it's ready for review? |
This lets us reference it later.
I don't expect this to build, let alone work. (Really, all these changes ought to be squashed when merged to SPIRV-Cross.)
Work out how indexing works for triangle fans. A little bit closer...
This is the only type of shader that can even have such outputs. Not only does this save some work in most cases, it also fixes a problem with the next patch.
We still rely on it to pass around and collect the output. To avoid duplicates, only do this if we would not do this normally.
Use the offset from the `XfbOutput` struct instead of querying it again from the ID. Use the `member_index` local instead of using `size() - 1` when setting member decorations. Don't set the qualified name for builtin block variabless--we handle those a different way. Use the member index from the `XfbOutput` when inspecing the original block type instead of the `member_index` local. This one was a real bug; honestly, I don't know how it even worked before.
…types. Make sure they use the `thread` AS and that they have the `packed_` prefix, if necessary.
Add missing changes from previous patch.
...on the builder and not for me.
…hash<enum> is supposed to just work.
…et also doesn't support enum in C++11 for hash key
spvXfbBuffer3 spvXfbOutput3 = {}; | ||
VertOut _20 = {}; | ||
if (any(gl_GlobalInvocationID >= spvStageInputSize)) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks threadgroup_barrier.
if (all(gl_GlobalInvocationID.xy == 0)) | ||
{ | ||
uint spvWritten = spvStageInputSize.x * spvStageInputSize.y; | ||
atomic_store_explicit(spvXfbCounter1, spvInitOffset1 + sizeof(*spvXfb1) * spvWritten, memory_order_relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is XFB ordering maintained here? XFB data must be emitted in-order with input primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual XFB buffers are indexed by the global invocation ID.
spvXfb3 = reinterpret_cast<device spvXfbBuffer3*>(reinterpret_cast<device char*>(spvXfb3) + spvInitOffset3); | ||
if ((gl_GlobalInvocationID.x % 3u == 2) || gl_GlobalInvocationID.x + 2 < spvStageInputSize.x) | ||
spvXfb3[spvXfbIndex] = spvXfbOutput3; | ||
threadgroup_barrier(mem_flags::mem_device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a barrier even needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure the atomic load of the counter happens-before the atomic update, particularly since Metal doesn't support acquire/release semantics for atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different workgroup can touch the counter here though, so not sure threadgroup_barrier is enough to ensure the in-order requirement of XFB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one thread in the entire dispatch is allowed to write the counter.
...But if the first workgroup completes before the others, they may wind up loading the counter after the first thread writes to it. Hmm, this may be tougher than I thought...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reasonable solution is to manage the counter from the outside and only pass the offset (read-only) into the shader.
spvXfbBuffer1 spvXfbOutput1 = {}; | ||
spvXfbBuffer2 spvXfbOutput2 = {}; | ||
spvXfbBuffer3 spvXfbOutput3 = {}; | ||
VertOut _25 = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does vertex output go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly to the transform feedback buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this rasterized? Do you re-shade through the normal vertex shader in a second pass, or do you shade from the XFB feedback as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intend to have MoltenVK shade from the XFB data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is broken. Some reasons why:
- It's possible to only capture a subset of varyings.
- Robustness rules. XFB buffers can be exhausted, where you still rasterize, but you must stop writing XFB data.
spvXfbBuffer2 spvXfbOutput2 = {}; | ||
spvXfbBuffer3 spvXfbOutput3 = {}; | ||
VertOut _25 = {}; | ||
if (any(gl_GlobalInvocationID >= spvStageInputSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about index buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, this is 58 (!) commits. I'm not reviewing that as-is. Please rebase this into something more digestable.
Also, given the scope of this, please add a PR description that explains the implementation strategy, problem scenarios, which corner cases don't work, etc, also why this feature is even desirable in the first place. Does it pass CTS?
Only gets the base of the primitive so far.
Adds support for MoltenVK to use TransformFeedback