-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SPI and Core Support for JDBC Join Pushdown Optimization #24115
base: master
Are you sure you want to change the base?
Conversation
e248f96
to
50a7e8b
Compare
The jdbc code to leverage the changes in this PR could be found here - Haritha-Koloth#2 |
Test failure seems to be unrelated to the code changes in this PR. Raised a sample PR with ReadMe change and it fails with the same error too - #24118 Will force-push after some time to see if the error resolves. |
Saved that user @Haritha-Koloth is from IBM |
c7d072a
to
471eb62
Compare
this feature is very useful, and significant when presto as virtual data engine. |
Just curious, why will delegating joins to the datasource result in performance gains? |
@jaystarshot Join pushdown will help when -
See these examples listed in the RFC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick first pass. I think we should switch GroupInnerJoinsByConnector
to be an iterative optimizer rule.
ImmutableList.of( | ||
new SchemaTableName("test_schema", "test_view"), | ||
new SchemaTableName("test_schema", "another_table"))) | ||
.withConnectorCapabilities(ImmutableSet.of()).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have all the connectors SUPPORTS_JOIN_PUSHDOWN. The test should work with or without the connectors supporting it
@Test | ||
public void testDoesNotPushDownOuterJoin() | ||
{ | ||
String catalogName = "test_catalog"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try refactoring this setup code into smaller methods that can be reused across tests
VariableReferenceExpression right = newBigintVariable("a2"); | ||
EquiJoinClause joinClause = new EquiJoinClause(left, right); | ||
|
||
PlanNode plan = output(join(FULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join pushdown will also not work for LEFT and RIGHT joins, can we add those assertions too ?
constraint, | ||
Optional.empty()); | ||
|
||
PlanBuilder planBuilder = new PlanBuilder(session, new PlanNodeIdAllocator(), metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused
public class GroupInnerJoinsByConnector | ||
implements PlanOptimizer | ||
{ | ||
private static final Logger logger = Logger.get(GroupInnerJoinsByConnector.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused
{ | ||
if (isEnabled(session)) { | ||
PlanNode rewrittenPlan = SimplePlanRewriter.rewriteWith(new Rewriter(functionResolution, determinismEvaluator, idAllocator, metadata, session), plan); | ||
return PlanOptimizerResult.optimizerResult(rewrittenPlan, !rewrittenPlan.equals(plan)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been preferring the use of the IterativeOptimizer over PlanOptimizer.
Since your rewrite can very explicitly determine if the plan is changed (by using rewrittenPlan.equals(plan)
) modifying this class to instead implement an iterative optimizer rule (i.e implements Rule<JoinNode>
) should be pretty straightforward.
This should reduce the test burden as well - you will be able to use a RuleTester
and its associated helpers directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aaneja , @Haritha-Koloth
Below are the reasons to use PlanOptimizer instead of Iterative optimizer Rule.
-
Need to visit Multiple nodes (visitJoin, visitFilter, etc.). Using a Rule we may able to perform operations on a single node only, I guess.
-
Repeated application of JoinNode Rule, due to child join.
Not sure these cases can resolved through a Single Iterative Optimizer Rule. For handling filter node, required a Rule and for join required another Rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaneja Thanks! We are exploring the potential and viability of the above suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Haritha-Koloth and @aaneja
I think we already did some good test coverage for this using PlanOptimizer itself.
Do we have any additional benefit for changing to optimizer Rule? If we use optimizer Rule how we can handle non-equijoin scenario, where we have to visit FilterNode and JoinNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ajas-Mangal As per the discussion with @aaneja, the usage of IterativeOptimizer is being preferred over PlanOptimizer implementations and there is a potential advantage of opening up more nodes for pushdown.
As for the non-equijoin scenario, I guess we can have multiple rule matchers to handle the filter and join nodes.
I am trying out the suggested changes, will update here if met with some blockers/concerns.
Hi @aaneja , any next step on this? |
A rewrite of the existing rule as an IterativeOptimizer is being explored, see #24115 (comment). |
Thanks for the release note entry! Nit, please include the PR number for each item.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation for the configuration properties optimizer.inner-join-pushdown-enabled
and optimizer.inequality-join-pushdown-enabled
, and the session properties optimizer_inner_join_pushdown_enabled
and optimizer_inequality_join_pushdown_enabled
.
471eb62
to
ecb3a3c
Compare
@steveburnett - Is this the correct file to add documentation for the above configurations? |
Yes, for the configuration properties. Please also document the session properties by updating https://github.com/prestodb/presto/blob/05bc56ceddaf7debed9f5fd7aa20e07093aea969/presto-docs/src/main/sphinx/admin/properties-session.rst. |
151c76d
to
30a3dbb
Compare
Join Pushdown SPI and Core changes Co-Authored-By: Ajas M <[email protected]> Co-Authored-By: Glerin Pinhero <[email protected]> Co-Authored-By: Thanzeel Hassan <[email protected]> Co-Authored-By: Anant Aneja <[email protected]>
30a3dbb
to
0b021a5
Compare
Hi @aaneja We explored your suggestion to convert GroupInnerJoinsByConnector into an IterativeOptimizer. Below are our observations and findings:
The modified code is available here - iterative_opt_pushdown Please let us know your thoughts on these. CC: @Ajas-Mangal |
Reg 1 - I did a couple of join pushdown tests with the outputs restricted (see aaneja/iterative_opt_pushdown). I did not find any errors here. Please feel to try it out with your setup and let me know if you spot any failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! Formatting nits, all the links and everything else looks good.
Formatting docs
3d6af14
to
ef0a411
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
Hi @aaneja , regarding the point 2): We need to build a matcher that can accept a join OR a filter. But I could not find any examples where 2 different types of nodes can be handled via a single rule. The kind of node we are trying to accept should be mentioned on the implementation also, right? How can we accommodate 2 different types there? Did you mean that we can have 2 different rules and patterns for these 2 use cases? Or is there a way we could use a generic PlanNode for this? |
@Haritha-Koloth What you have is a case where you want to match on |
Description
This change introduces foundational support for join pushdown in JDBC connectors. By delegating join operations, directly to the remote datasource, it unlocks significant performance gains by leveraging the processing capabilities of the underlying datasource.
Motivation and Context
Fixes 23152
Impact
This update includes only the SPI and core changes for the feature. Users will see its impact once the JDBC modifications to leverage the pushdown optimizer are also integrated. The primary benefit is enhanced performance for INNER join queries involving large tables with substantial row counts.
Detailed RFC - prestodb/rfcs#32
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: