You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The factorized attention is quadratic in axial attention rather than the full size, which should greatly reduce the computational power needed for the model, both training (more important) and inference, since most of the global weather models seem to be able to do inference on small amounts of compute, but training takes a lot of it.
The text was updated successfully, but these errors were encountered:
Arxiv/Blog/Paper Link
https://arxiv.org/abs/2405.07395
Detailed Description
Context
The factorized attention is quadratic in axial attention rather than the full size, which should greatly reduce the computational power needed for the model, both training (more important) and inference, since most of the global weather models seem to be able to do inference on small amounts of compute, but training takes a lot of it.
The text was updated successfully, but these errors were encountered: