[DAPHNE-#77] Avoid stack overflow in long-running DaphneDSL loops. #820

pdamme · 2024-09-03T20:54:06Z

As described in DaphneDSL loops crash after a certain number of iterations #77, DaphneDSL loops with many iterations (e.g., >100k) crash DAPHNE.
The reason is that LowerToLLVMPass creates LLVM::AllocaOp in several places and for various purposes.
These AllocaOps could also be inside a loop, which means that the memory they allocate piles up until the stack is exhausted, because it is only freed at the end of the "scope" (function).
This commit solves the problem by ensuring that all AllocaOps created in LowerToLLVMPass are inserted at the beginning of the function surrounding the currently considered operation.
With that, there are no AllocaOps in loops anymore.
Note that the memory allocated by the AllocaOps can safely be reused by repeated kernel calls in different loop iterations.
Added script-level test cases employing loops with many iterations.
This bug has existed for a long time (partly because it wasn't a blocking issue until recently).
- Thanks to @corepointer and @philipportner for investigating the causes (this fix is based upon their insights).
- Thanks to @J-Brest and @borkob for providing a work-around for DAPHNE users.
Closes DaphneDSL loops crash after a certain number of iterations #77.

I think, in the future, we could even reduce the number of AllocaOps by making different kernel calls use the same memory slots for their result pointers. However, the most important goal for now is to get this bug fixed on main.

@corepointer

- As described in #77, DaphneDSL loops with many iterations (e.g., >100k) crash DAPHNE. - The reason is that LowerToLLVMPass creates LLVM::AllocaOp in several places and for various purposes. - These AllocaOps could also be inside a loop, which means that the memory they allocate piles up until the stack is exhausted, because it is only freed at the end of the "scope" (function). - This commit solves the problem by ensuring that all AllocaOps created in LowerToLLVMPass are inserted at the beginning of the function surrounding the currently considered operation. - With that, there are no AllocaOps in loops anymore. - Note that the memory allocated by the AllocaOps can safely be reused by repeated kernel calls in different loop iterations. - Added script-level test cases employing loops with many iterations. - This bug has existed for a long time (partly because it wasn't a blocking issue until recently). - Thanks to @corepointer and @philipportner for investigating the causes (this fix is based upon their insights). - Thanks to @J-Brest and @borkob for providing a work-around for DAPHNE users. - Closes #77.

corepointer · 2024-09-05T07:08:33Z

Awesome @pdamme
Will test it right away 👍

corepointer · 2024-09-05T17:25:53Z

I've been playing around with various memory profilers to test the issue and the solution. Stack usage looks quite normal with your changes applied. Also the explain_llvm output shows the improvements nicely. Thx for fixing this long standing bug at last @pdamme

pdamme added the bug A mistake in the code. label Sep 3, 2024

corepointer merged commit b294230 into main Sep 5, 2024
2 checks passed

philipportner mentioned this pull request Sep 25, 2024

Segmentation fault when using FOR loops on matrices #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DAPHNE-#77] Avoid stack overflow in long-running DaphneDSL loops. #820

[DAPHNE-#77] Avoid stack overflow in long-running DaphneDSL loops. #820

pdamme commented Sep 3, 2024

corepointer commented Sep 5, 2024

corepointer commented Sep 5, 2024

[DAPHNE-#77] Avoid stack overflow in long-running DaphneDSL loops. #820

[DAPHNE-#77] Avoid stack overflow in long-running DaphneDSL loops. #820

Conversation

pdamme commented Sep 3, 2024

corepointer commented Sep 5, 2024

corepointer commented Sep 5, 2024