-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hash join and more support for strings value types in kernel functions #926
Add hash join and more support for strings value types in kernel functions #926
Conversation
- EwBinaryOpCode: Updated specification which elementwise binary ops are supported on which string value types (std::string, FixedStr16, const char *). - Needed to generalize some macros from one common value type for both operands to individual value types, e.g., to support "matrix of std::string op string scalar". - Fixed a small type bug in EwBinaryObjSca-kernel. - ExtractRow- and FilterRow-kernels on frames work with frames with std::string columns. - Added instantiations of EwBinaryObjSca-kernel for comparisons of string matrices with string scalar (integer result). - Order- and Group-kernels can handle frames with std::string columns. - These changes were originally included in PRs #918, #921, and #926 by @saminbassiri. - @pdamme is committing them in the name of @saminbassiri (for correct attribution) in a separate commit, since they don't really fit the topics of those PRs.
I committed the changes not directly related to the topic of this PR in commit a374814. |
- Add hash join kernel function. - Add hashJoin built-in function to DaphneDSL. - Create HashJoinOp in DaphneIR.
- Instead of adding the hash-join throughout the DAPHNE stack (DaphneDSL built-in function, DaphneIR op plus inference etc., runtime kernel), we should use the hash-join code as the implementation of the innerJoin-kernel. - So far, the innerJoin-kernel was anyway extremely inefficient and had numerous memory leaks related to Frame::getColumn(). - Consequently, I removed the hashJoin built-in/op/kernel and move the kernel code to the innerJoin-kernel. - Also reverted changes not directly related to this PR, they have already been contributed to main in commit a374814.
…-kernel. - Fixed some memory leaks: matrices obtained through Frame::getColumn() must be destroyed explicitly. - Ensured that not only n:1-joins but also n:m-joins are supported (modified the existing unit test case for the innerJoin-kernel to trigger such a case). - Added license header (was already missing before this PR).
c2f9cec
to
39ee4d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @saminbassiri. A hash-based join implementation was long overdue in DAPHNE. You added the hash-join throughout the DAPHNE stack (DaphneDSL built-in function, DaphneIR op plus inference, runtime kernel).
While this integration looks good from a technical point of view, I think we should not expose the physical join variant to users. Instead, it's much more useful to replace the existing innerJoin-kernel (which was very inefficient and had memory leaks) by your hash-based implementation. Consequently, I removed the new hashJoin()
built-in and HashJoinOp
.
I fixed a few little things in the hash-join code: memory leaks related to Frame::getColumn()
and made the kernel support n:m-joins (by using a std::unordered_map<VT, std::vector<size_t>>
for the hash table). See my commits for details.
With that, this PR is ready to be merged.
Add Hash Join and enhance kernel functions FilterRow and EwBinary to handle String Value Types.
Description:
This pull request introduces enhancements to the system by adding a new
hashJoin
operation and extending support for string value types in kernel functions.Key Changes:
Hash Join Implementation:
hashJoin
as a kernel function.hashJoin
as a built-in function in DaphneDSL.HashJoinOp
in DaphneIR.String Value Support:
FilterRow
kernel function for compatibility with string values.EwBinary
operations to comparechar*
strings withstd::string
.kernel.json
to reflect the new string value support.Next Steps:
Add tests to validate correct behavior across various scenarios.