You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should test compile with these autograds, and register them. Note that it is better to avoid what kernel-hyperdrive does as it registers them as custom_ops, see here,
for the kernel-hyperdrive it cant be helped as there is some stride issue
Using rms_layer_norm as an example, here is my attempt to list out a set of prescriptive tasks.
Look at all the different kernels that are attached to a model, e.g., llama. Go through them one by one.
For example, start with rms_layer_norm. In the above example, we replace the LlamaRMSNorm with the fast_rms_layernorm
The implementation of fast_rms_layernorm is found here, which is an autograd function Fast_RMS_Layernorm that as a triton kernel _rms_layernorm_forward in the forward, and _rms_layernorm_backward in the backward.
So to make this compilable, you must follow the pattern, to register it as a graph op. One way to do this is custom_op, as it is done here .
Using custom_ops can have overhead, so if its easier, we can do this as a first pass, but we need a clean way to disable the custom_op if compile is not enabled.
Finally, the more "standard" way to register ops is the torch.library.define pattern, see this issue for example.
We have a quite a few custom
autograd
functions in the FOAK pluginWe should test compile with these autograds, and register them. Note that it is better to avoid what
kernel-hyperdrive
does as it registers them ascustom_ops
, see here,kernel-hyperdrive
it cant be helped as there is some stride issueIf there are functions in
autograds
that need to be changed, the bench needs to be rerun for accuracy and performance checksThe text was updated successfully, but these errors were encountered: