Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAPHNE-#755] Initial support for lists in DaphneDSL. #806

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

pdamme
Copy link
Collaborator

@pdamme pdamme commented Aug 7, 2024

This PR adds basic support for lists in DaphneDSL. These are urgently required to speed up the decision trees DaphneDSL script, whose performance is crucial to our use case partners.

As the introduction of lists is a larger change, I would like to make everyone interested aware and enable a discussion on this PR. From my point of view, this PR is fine (the limitations mentioned below are not crucial and can be addressed later).


From the commit message:

  • This PR adds basic support for lists of matrices of homogeneous physical data type and value type.
  • Supported operations:
    • createList(), length(), append(), remove(), print()
    • append() and remove() do not modify the given list, but return the resulting list (consistent with the bahavior of matrix/frame operations)
  • Concrete changes:
    • DaphneDSL - Four new built-in functions: createList(), length(), append(), remove(). - Updated the DaphneDSL language reference and the list of built-in functions.
    • DaphneIR/DAPHNE compiler
      • A new custom MLIR type: List.
      • Four new DaphneIR operations: CreateListOp, LengthOp, AppendOp, RemoveOp. - Type inference for CreateListOp, AppendOp, and RemoveOp. - Consideration of list types in the kernel extension catalog. - Lowering of the four new ops to kernel calls. - Lowering of the List type in LowerToLLVMPass in the same way as matrices/frames.
    • DAPHNE runtime - Four new kernels: createList, length, append, remove. - Registration of the new kernels in kernels.json.
    • Garbage collection of list items
      • ManageObjRefsPass treats lists like matrices/frames.
      • Besides that:
        • The reference counter of a data object is increased when it is inserted into a list to ensure that the object is not freed as long as the list exists.
        • When a list is destroyed, the reference counter of all its elements is decreased by 1. - When an element is removed from a list, its reference counter is not decreased, because we return the removed element (it lives on).
    • Added script-level test cases for the usage of lists.
  • Current limitations:
    • No support for heterogeneous data/value type (e.g., one cannot store DenseMatrix and CSRMatrix or DenseMatrix and DenseMatrix in the same list).
    • Not possible to create an empty list in DaphneDSL, because that would complicate type inference.
    • Lists are not supported in DaphneLib yet.
    • No get/set on list elements yet, only append and remove.
    • The append/remove kernels copy the input list instead of modifying it in place. While this is consistent with the behavior of matrix/frame ops, it is inefficient. However, compared to workarounds to lists, this is still much faster; Furthermore, we can use the same update-in-place mechanisms as for matrices/frames in the future, when the input list is not used anymore afterwards (see PR Enable Update-in-Place Optimization in DAPHNE (Issue #498) #609).
    • Information about interesting data properties gets lost by inserting and removing a matrix into/from a list.
    • Subclassing Structure may not be optimal; it might be better to subclass a new superclass of Structure, but I wanted to keep the refactoring overhead low for now.
  • Effect on decision trees DaphneDSL script
    • The decision trees script uses queues to keep certain matrices around.
    • So far, we emulated these queues through matrix concatenation (cbind/rbind), which turned out to be very inefficient.
    • The script uses lists for the queues now, which leads to significant performance improvements (~15x for a concrete use-case pipeline on my machine).
  • Closes Lists in DaphneDSL #755.

@pdamme pdamme added feature missing/requested features performance label for PRs of perf++ and issues of perf-- labels Aug 7, 2024
@corepointer corepointer added this to the v0.3 milestone Aug 8, 2024
- This PR adds basic support for lists of matrices of homogeneous physical data type and value type.
- Supported operations:
  - createList(), length(), append(), remove(), print()
  - append() and remove() do not modify the given list, but return the resulting list (consistent with the bahavior of matrix/frame operations)
- Concrete changes:
  - DaphneDSL
    - Four new built-in functions: createList(), length(), append(), remove().
    - Updated the DaphneDSL language reference and the list of built-in functions.
  - DaphneIR/DAPHNE compiler
    - A new custom MLIR type: List.
    - Four new DaphneIR operations: CreateListOp, LengthOp, AppendOp, RemoveOp.
    - Type inference for CreateListOp, AppendOp, and RemoveOp.
    - Consideration of list types in the kernel extension catalog.
    - Lowering of the four new ops to kernel calls.
    - Lowering of the List type in LowerToLLVMPass in the same way as matrices/frames.
  - DAPHNE runtime
    - Four new kernels: createList, length, append, remove.
    - Registration of the new kernels in kernels.json.
  - Garbage collection of list items
    - ManageObjRefsPass treats lists like matrices/frames.
    - Besides that:
      - The reference counter of a data object is increased when it is inserted into a list to ensure that the object is not freed as long as the list exists.
      - When a list is destroyed, the reference counter of all its elements is decreased by 1.
      - When an element is removed from a list, its reference counter is *not* decreased, because we return the removed element (it lives on).
  - Added script-level test cases for the usage of lists.
- Current limitations:
  - No support for heterogeneous data/value type (e.g., one cannot store DenseMatrix and CSRMatrix or DenseMatrix<double> and DenseMatrix<float> in the same list).
  - Not possible to create an empty list in DaphneDSL, because that would complicate type inference.
  - Lists are not supported in DaphneLib yet.
  - No get/set on list elements yet, only append and remove.
  - The append/remove kernels copy the input list instead of modifying it in place. While this is consistent with the behavior of matrix/frame ops, it is inefficient. However, compared to workarounds to lists, this is still much faster; Furthermore, we can use the same update-in-place mechanisms as for matrices/frames in the future, when the input list is not used anymore afterwards (see PR #609).
  - Information about interesting data properties gets lost by inserting and removing a matrix into/from a list.
  - Subclassing Structure may not be optimal; it might be better to subclass a new superclass of Structure, but I wanted to keep the refactoring overhead low for now.
- Effect on decision trees DaphneDSL script
  - The decision trees script uses queues to keep certain matrices around.
  - So far, we emulated these queues through matrix concatenation (cbind/rbind), which turned out to be very inefficient.
  - The script uses lists for the queues now, which leads to significant performance improvements (~15x for a concrete use-case pipeline on my machine).
- Closes #755.
@philipportner philipportner force-pushed the 755-lists-in-daphnedsl branch from 814e25c to 151e5a1 Compare August 9, 2024 13:08
@philipportner philipportner merged commit c4c928a into main Aug 9, 2024
2 checks passed
@philipportner
Copy link
Collaborator

Great feature to add to DAPHNE, thanks @pdamme !
I rebased the branch and merged it in as people want to see this in main.

I've done a code review, overall the code looks very good to me, I have noted down a list of things I'd like to follow up on once you're back. As I don't think these should block the PR and can be fixed in the future I merged it into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature missing/requested features performance label for PRs of perf++ and issues of perf--
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lists in DaphneDSL
3 participants