Introduce SPI and Core Support for JDBC Join Pushdown Optimization #24115

Haritha-Koloth · 2024-11-22T05:46:05Z

Description

This change introduces foundational support for join pushdown in JDBC connectors. By delegating join operations, directly to the remote datasource, it unlocks significant performance gains by leveraging the processing capabilities of the underlying datasource.

Motivation and Context

Fixes 23152

Impact

This update includes only the SPI and core changes for the feature. Users will see its impact once the JDBC modifications to leverage the pushdown optimizer are also integrated. The primary benefit is enhanced performance for INNER join queries involving large tables with substantial row counts.

Detailed RFC - prestodb/rfcs#32

Test Plan

Unit tests added for newly added optimizer

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add configuration property ``optimizer.inner-join-pushdown-enabled`` to support inner join pushdown. Defaults to ``false``. Session property to override this ``optimizer_inner_join_pushdown_enabled``. :pr:`24115`
* Add configuration property ``optimizer.inequality-join-pushdown-enabled`` to support inequality join pushdown. Defaults to ``false``. Session property to override this ``optimizer_inequality_join_pushdown_enabled``. :pr:`24115`

If release note is NOT required, use:

== NO RELEASE NOTE ==

Haritha-Koloth · 2024-11-22T09:45:23Z

The jdbc code to leverage the changes in this PR could be found here - Haritha-Koloth#2

Haritha-Koloth · 2024-11-22T11:48:01Z

Test failure seems to be unrelated to the code changes in this PR. Raised a sample PR with ReadMe change and it fails with the same error too - #24118

Will force-push after some time to see if the error resolves.

prestodb-ci · 2024-11-22T15:56:28Z

Saved that user @Haritha-Koloth is from IBM

lschampion · 2024-12-06T01:09:39Z

this feature is very useful, and significant when presto as virtual data engine.

jaystarshot · 2024-12-10T18:27:47Z

By delegating join operations, directly to the remote datasource, it unlocks significant performance gains by leveraging the processing capabilities of the underlying datasource.

Just curious, why will delegating joins to the datasource result in performance gains?
Nvm just saw one use case where data was sent to presto and then joined so saves transfer cost or something?

aaneja · 2024-12-12T09:03:15Z

@jaystarshot Join pushdown will help when -

We have a reducing join - data that Presto reads from the 'join' source will be much smaller than streaming the two join sources
The underlying relational system can do more joins more efficiently than Presto - say use index lookups instead of having to do a hash-join

See these examples listed in the RFC

aaneja

Did a quick first pass. I think we should switch GroupInnerJoinsByConnector to be an iterative optimizer rule.

aaneja · 2024-12-13T10:24:51Z

.../test/java/com/facebook/presto/sql/planner/optimizations/TestGroupInnerJoinsByConnector.java

+                        ImmutableList.of(
+                                new SchemaTableName("test_schema", "test_view"),
+                                new SchemaTableName("test_schema", "another_table")))
+                .withConnectorCapabilities(ImmutableSet.of()).build();


Let's have all the connectors SUPPORTS_JOIN_PUSHDOWN. The test should work with or without the connectors supporting it

aaneja · 2024-12-13T10:26:16Z

.../test/java/com/facebook/presto/sql/planner/optimizations/TestGroupInnerJoinsByConnector.java

+    @Test
+    public void testDoesNotPushDownOuterJoin()
+    {
+        String catalogName = "test_catalog";


Please try refactoring this setup code into smaller methods that can be reused across tests

aaneja · 2024-12-13T10:27:16Z

.../test/java/com/facebook/presto/sql/planner/optimizations/TestGroupInnerJoinsByConnector.java

+        VariableReferenceExpression right = newBigintVariable("a2");
+        EquiJoinClause joinClause = new EquiJoinClause(left, right);
+
+        PlanNode plan = output(join(FULL,


Join pushdown will also not work for LEFT and RIGHT joins, can we add those assertions too ?

aaneja · 2024-12-13T10:31:08Z

.../test/java/com/facebook/presto/sql/planner/optimizations/TestGroupInnerJoinsByConnector.java

+                constraint,
+                Optional.empty());
+
+        PlanBuilder planBuilder = new PlanBuilder(session, new PlanNodeIdAllocator(), metadata);


aaneja · 2024-12-13T10:38:26Z

.../src/main/java/com/facebook/presto/sql/planner/optimizations/GroupInnerJoinsByConnector.java

+public class GroupInnerJoinsByConnector
+        implements PlanOptimizer
+{
+    private static final Logger logger = Logger.get(GroupInnerJoinsByConnector.class);


aaneja · 2024-12-13T10:46:11Z

.../src/main/java/com/facebook/presto/sql/planner/optimizations/GroupInnerJoinsByConnector.java

+    {
+        if (isEnabled(session)) {
+            PlanNode rewrittenPlan = SimplePlanRewriter.rewriteWith(new Rewriter(functionResolution, determinismEvaluator, idAllocator, metadata, session), plan);
+            return PlanOptimizerResult.optimizerResult(rewrittenPlan, !rewrittenPlan.equals(plan));


We've been preferring the use of the IterativeOptimizer over PlanOptimizer.

Since your rewrite can very explicitly determine if the plan is changed (by using rewrittenPlan.equals(plan)) modifying this class to instead implement an iterative optimizer rule (i.e implements Rule<JoinNode>) should be pretty straightforward.

This should reduce the test burden as well - you will be able to use a RuleTester and its associated helpers directly

Hi @aaneja , @Haritha-Koloth

Below are the reasons to use PlanOptimizer instead of Iterative optimizer Rule.

Need to visit Multiple nodes (visitJoin, visitFilter, etc.). Using a Rule we may able to perform operations on a single node only, I guess.

Repeated application of JoinNode Rule, due to child join.

Not sure these cases can resolved through a Single Iterative Optimizer Rule. For handling filter node, required a Rule and for join required another Rule.

@aaneja Thanks! We are exploring the potential and viability of the above suggestions.

Hi @Haritha-Koloth and @aaneja

I think we already did some good test coverage for this using PlanOptimizer itself.

Do we have any additional benefit for changing to optimizer Rule? If we use optimizer Rule how we can handle non-equijoin scenario, where we have to visit FilterNode and JoinNode

@Ajas-Mangal As per the discussion with @aaneja, the usage of IterativeOptimizer is being preferred over PlanOptimizer implementations and there is a potential advantage of opening up more nodes for pushdown.

As for the non-equijoin scenario, I guess we can have multiple rule matchers to handle the filter and join nodes.

I am trying out the suggested changes, will update here if met with some blockers/concerns.

ethanyzhang · 2025-01-07T06:19:20Z

Hi @aaneja , any next step on this?

aaneja · 2025-01-07T06:56:10Z

Hi @aaneja , any next step on this?

A rewrite of the existing rule as an IterativeOptimizer is being explored, see #24115 (comment).

steveburnett · 2025-01-07T15:35:01Z

Thanks for the release note entry! Nit, please include the PR number for each item.

== RELEASE NOTES ==

General Changes
* Add configuration property ``optimizer.inner-join-pushdown-enabled`` to support inner join pushdown. Defaults to ``false``. Session property to override this ``optimizer_inner_join_pushdown_enabled``. :pr:`24115`
* Add configuration property ``optimizer.inequality-join-pushdown-enabled`` to support inequality join pushdown. Defaults to ``false``. Session property to override this ``optimizer_inequality_join_pushdown_enabled``. :pr:`24115`

steveburnett

Please add documentation for the configuration properties optimizer.inner-join-pushdown-enabled and optimizer.inequality-join-pushdown-enabled, and the session properties optimizer_inner_join_pushdown_enabled and optimizer_inequality_join_pushdown_enabled.

Haritha-Koloth · 2025-01-13T06:11:47Z

Please add documentation for the configuration properties optimizer.inner-join-pushdown-enabled and optimizer.inequality-join-pushdown-enabled, and the session properties optimizer_inner_join_pushdown_enabled and optimizer_inequality_join_pushdown_enabled.

@steveburnett - Is this the correct file to add documentation for the above configurations?

https://github.com/prestodb/presto/blob/05bc56ceddaf7debed9f5fd7aa20e07093aea969/presto-docs/src/main/sphinx/admin/properties.rst#optimizer-properties

steveburnett · 2025-01-13T14:35:30Z

Please add documentation for the configuration properties optimizer.inner-join-pushdown-enabled and optimizer.inequality-join-pushdown-enabled, and the session properties optimizer_inner_join_pushdown_enabled and optimizer_inequality_join_pushdown_enabled.

@steveburnett - Is this the correct file to add documentation for the above configurations?
* https://github.com/prestodb/presto/blob/05bc56ceddaf7debed9f5fd7aa20e07093aea969/presto-docs/src/main/sphinx/admin/properties.rst#optimizer-properties

Yes, for the configuration properties. Please also document the session properties by updating https://github.com/prestodb/presto/blob/05bc56ceddaf7debed9f5fd7aa20e07093aea969/presto-docs/src/main/sphinx/admin/properties-session.rst.

Join Pushdown SPI and Core changes Co-Authored-By: Ajas M <[email protected]> Co-Authored-By: Glerin Pinhero <[email protected]> Co-Authored-By: Thanzeel Hassan <[email protected]> Co-Authored-By: Anant Aneja <[email protected]>

Haritha-Koloth · 2025-01-13T18:33:20Z

Hi @aaneja

We explored your suggestion to convert GroupInnerJoinsByConnector into an IterativeOptimizer. Below are our observations and findings:

Equality Joins: For equality joins, we are able to construct a single table scan node based on the current logic. However, this approach fails because the IterativeOptimizer flow requires the input and output variables to remain consistent. When building the single table scan node, we need to include output variables from all sources to enable predicate pushdown. Below is the specific error encountered after the optimizer is run once.
java.lang.IllegalArgumentException: com.facebook.presto.sql.planner.optimizations.GroupInnerJoinsByConnector: transformed expression doesn't produce same outputs: [intcolumn2, varcharcolumn1_2] vs [intcolumn1, intcolumn2, intcolumn1_0, varcharcolumn1_2] for node: com.facebook.presto.spi.plan.FilterNode@d85792e3
Inequality Joins: To handle inequality joins, we may need to create a separate rule that can accommodate FilterNodes. Alternatively, we may need to enhance the logic in Pattern.java to return a generic node type(PlanNode) after evaluating multiple pattern matches and implement Rule from GroupInnerJoinsByConnector.
Exit Condition: Additionally, we may need to introduce an exit condition to skip over processed join nodes. This would address scenarios where nodes are being iteratively processed and prevent redundant processing.

The modified code is available here - iterative_opt_pushdown

Please let us know your thoughts on these.

CC: @Ajas-Mangal

aaneja · 2025-01-14T12:46:09Z

Yes, the IterativeOptimizer has an invariant that a transformation cannot produce more or less output variables than the input plan nodes. A simple way to restrict the ouputs is to add a ProjectNode that constrains the output to the required variables, e.g aaneja@dc2eff4. If there is scope for pushing down this ProjectNode, it will be handled by later rules.
The pattern matcher allows for creating multiple nested matchers and has the ability to extract any specific node during the matching process. See this matcher for example. If you can describe the kind of matcher you need, I can help you build it
Can you describe an example of redundant processing ? There can be cases where we reapply the same rule more than once on the same already optimized node. This is by design. The 'exit' condition we want for an IterativeOptimizer rule is that there are no plan modifications after rule re-application

Reg 1 - I did a couple of join pushdown tests with the outputs restricted (see aaneja/iterative_opt_pushdown). I did not find any errors here. Please feel to try it out with your setup and let me know if you spot any failures

steveburnett

Thanks for the doc! Formatting nits, all the links and everything else looks good.

presto-docs/src/main/sphinx/admin/properties-session.rst

presto-docs/src/main/sphinx/admin/properties.rst

Formatting docs

steveburnett

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

Haritha-Koloth · 2025-01-16T05:24:07Z

The pattern matcher allows for creating multiple nested matchers and has the ability to extract any specific node during the matching process. See this matcher for example. If you can describe the kind of matcher you need, I can help you build it

Hi @aaneja , regarding the point 2):

We need to build a matcher that can accept a join OR a filter. But I could not find any examples where 2 different types of nodes can be handled via a single rule. The kind of node we are trying to accept should be mentioned on the implementation also, right? How can we accommodate 2 different types there? Did you mean that we can have 2 different rules and patterns for these 2 use cases? Or is there a way we could use a generic PlanNode for this?

aaneja · 2025-01-16T12:26:51Z

@Haritha-Koloth What you have is a case where you want to match on JoinNode or Filter<-JoinNode and in the latter case, use the filter as an extra filter while building the 'combined' MultiJoinNode.
You can use a rule-set to define the relevant patterns and call a base-class for the actual rewriting
I rewrote your code here - aaneja@ed62d8b

Haritha-Koloth force-pushed the join_pushdown_spi branch 2 times, most recently from e248f96 to 50a7e8b Compare November 22, 2024 07:22

Haritha-Koloth mentioned this pull request Nov 22, 2024

Jdbc changes for Join Pushdown Haritha-Koloth/presto#2

Draft

6 tasks

tdcmeehan added the from:IBM PR from IBM label Nov 22, 2024

prestodb-ci requested review from a team, pratyakshsharma and anandamideShakyan and removed request for a team November 22, 2024 15:56

Haritha-Koloth marked this pull request as ready for review November 22, 2024 15:57

Haritha-Koloth requested review from jaystarshot, feilong-liu, elharo, ClarenceThreepwood and a team as code owners November 22, 2024 15:58

Haritha-Koloth requested a review from presto-oss November 22, 2024 15:58

Haritha-Koloth force-pushed the join_pushdown_spi branch 2 times, most recently from c7d072a to 471eb62 Compare December 2, 2024 05:27

ethanyzhang requested review from a team and removed request for a team December 4, 2024 16:14

aaneja reviewed Dec 13, 2024

View reviewed changes

steveburnett requested changes Jan 7, 2025

View reviewed changes

Haritha-Koloth force-pushed the join_pushdown_spi branch from 471eb62 to ecb3a3c Compare January 13, 2025 06:00

Haritha-Koloth force-pushed the join_pushdown_spi branch 2 times, most recently from 151c76d to 30a3dbb Compare January 13, 2025 16:27

Join Pushdown SPI and Core changes

0b021a5

Join Pushdown SPI and Core changes Co-Authored-By: Ajas M <[email protected]> Co-Authored-By: Glerin Pinhero <[email protected]> Co-Authored-By: Thanzeel Hassan <[email protected]> Co-Authored-By: Anant Aneja <[email protected]>

Haritha-Koloth force-pushed the join_pushdown_spi branch from 30a3dbb to 0b021a5 Compare January 13, 2025 16:28

Haritha-Koloth requested a review from steveburnett January 14, 2025 06:25

steveburnett requested changes Jan 14, 2025

View reviewed changes

presto-docs/src/main/sphinx/admin/properties-session.rst Outdated Show resolved Hide resolved

presto-docs/src/main/sphinx/admin/properties.rst Outdated Show resolved Hide resolved

Adding documentation for new configurations

ef0a411

Formatting docs

Haritha-Koloth force-pushed the join_pushdown_spi branch from 3d6af14 to ef0a411 Compare January 14, 2025 16:20

Haritha-Koloth requested a review from steveburnett January 14, 2025 16:21

steveburnett approved these changes Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce SPI and Core Support for JDBC Join Pushdown Optimization #24115

Introduce SPI and Core Support for JDBC Join Pushdown Optimization #24115

Haritha-Koloth commented Nov 22, 2024 •

edited

Loading

Haritha-Koloth commented Nov 22, 2024

Haritha-Koloth commented Nov 22, 2024

prestodb-ci commented Nov 22, 2024

lschampion commented Dec 6, 2024

jaystarshot commented Dec 10, 2024 •

edited

Loading

aaneja commented Dec 12, 2024

aaneja left a comment

aaneja Dec 13, 2024

aaneja Dec 13, 2024

aaneja Dec 13, 2024

aaneja Dec 13, 2024

aaneja Dec 13, 2024

aaneja Dec 13, 2024

Ajas-Mangal Dec 16, 2024

Haritha-Koloth Dec 19, 2024

Ajas-Mangal Dec 19, 2024 •

edited

Loading

Haritha-Koloth Dec 19, 2024

ethanyzhang commented Jan 7, 2025

aaneja commented Jan 7, 2025

steveburnett commented Jan 7, 2025

steveburnett left a comment

Haritha-Koloth commented Jan 13, 2025

steveburnett commented Jan 13, 2025

Haritha-Koloth commented Jan 13, 2025

aaneja commented Jan 14, 2025

steveburnett left a comment

steveburnett left a comment

Haritha-Koloth commented Jan 16, 2025

aaneja commented Jan 16, 2025

Introduce SPI and Core Support for JDBC Join Pushdown Optimization #24115

Are you sure you want to change the base?

Introduce SPI and Core Support for JDBC Join Pushdown Optimization #24115

Conversation

Haritha-Koloth commented Nov 22, 2024 • edited Loading

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Haritha-Koloth commented Nov 22, 2024

Haritha-Koloth commented Nov 22, 2024

prestodb-ci commented Nov 22, 2024

lschampion commented Dec 6, 2024

jaystarshot commented Dec 10, 2024 • edited Loading

aaneja commented Dec 12, 2024

aaneja left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ajas-Mangal Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanyzhang commented Jan 7, 2025

aaneja commented Jan 7, 2025

steveburnett commented Jan 7, 2025

steveburnett left a comment

Choose a reason for hiding this comment

Haritha-Koloth commented Jan 13, 2025

steveburnett commented Jan 13, 2025

Haritha-Koloth commented Jan 13, 2025

aaneja commented Jan 14, 2025

steveburnett left a comment

Choose a reason for hiding this comment

steveburnett left a comment

Choose a reason for hiding this comment

Haritha-Koloth commented Jan 16, 2025

aaneja commented Jan 16, 2025

Haritha-Koloth commented Nov 22, 2024 •

edited

Loading

jaystarshot commented Dec 10, 2024 •

edited

Loading

Ajas-Mangal Dec 19, 2024 •

edited

Loading