You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
Currently in a logical plan of OpenSearch, the LogicalEval is a blocker for many push-down optimization rules like TableScanPushDown.PUSH_DOWN_PROJECT, TableScanPushDown.PUSH_DOWN_SORT, TableScanPushDown.PUSH_DOWN_LIMIT and etc. Not only does it reduce the performance of execution, it also lead to wrong results if we cannot do pushdown optimization.
E.g. Below 2 PPL are semantic equal, but the latter one with eval retrieves all fields from OpenSearch(which is unnecessary), while not enough documents due to the limitation of settings of plugins.query.size_limit.
What solution would you like?
Since eval operator only adds or updates fields without any benefit on skipping or project data, and it could block other optimization, we should push down sort/limit/project operator through eval to evaluate eval's output as late as possible.
1. Limit
For limit, we could push down it under eval directly since it doesn't have any expressions.
2. Sort
For sort, for each sort field, we should analyze whether it's produced by eval.
2.1 All sort fields are not produced by eval, which means they can all be resolved by the Environment before eval, so we can push down it under eval directly
2.2 One or more fields are produced by eval, there are also some different cases we can dive deep whether replacement can work as semantic equally like sort(a) - eval(a=b+1), or other cases it cannot like sort(a) - eval(a=b^2). And there may be cases sort field are calculated by more than 1 original fields. I think we should talk about this case in another issue for advance enhancement.
2.2.1 the field is produced by equation to ReferenceExpression which is just simple rename like eval(sortField=originField), we can also push down sort by do some field replacement for these fields. Of course, there is some more complex cases like eval(newField=originField, sortField=newField), we should do replacement iteratively from right to left, or we just narrow down this case to not let it push down.
2.2.2 the field is produced by function, there are also some different cases we can dive deep whether replacement can work as semantic equally like sort(a) - eval(a=b+1), or other cases it cannot like sort(a) - eval(a=b^2). And there may be cases sort field are calculated by more than 1 original fields. I think we should talk about this case in another issue for advance enhancement.
3. Project
When we say pushing down a project under eval, it actually inserts another project before eval instead of pushing down the origin project itself.
We should consruct the new projectList by append all eval reference expressions to the original projectList, and kick off all reference expressions produced by eval. For example, if we have plan project(a, b, d) - eval(b=c+2, a=b+1), we should optimize it to project(a, b, c) - eval(a=b+1, b=c+2)) - project(c, d). Thus, we can do project in advance or push down project(c, d) into TableScan if possible, and ensure they are semantic equal.
qianheng-aws
changed the title
[FEATURE] Pull Up LogicalEval to evaluate as late as possible
[FEATURE] Push Down Sort/Limit/Project through LogicalEval
Jul 29, 2024
Not all scenarios for operators sort/limit/project in a query are able to push down through eval. Could you explain more details of your design in this description? @qianheng-aws
LantaoJin
changed the title
[FEATURE] Push Down Sort/Limit/Project through LogicalEval
[RFC] Push Down Sort/Limit/Project through LogicalEval
Aug 2, 2024
Is your feature request related to a problem?
Currently in a logical plan of OpenSearch, the LogicalEval is a blocker for many push-down optimization rules like
TableScanPushDown.PUSH_DOWN_PROJECT
,TableScanPushDown.PUSH_DOWN_SORT
,TableScanPushDown.PUSH_DOWN_LIMIT
and etc. Not only does it reduce the performance of execution, it also lead to wrong results if we cannot do pushdown optimization.E.g. Below 2 PPL are semantic equal, but the latter one with eval retrieves all fields from OpenSearch(which is unnecessary), while not enough documents due to the limitation of settings of
plugins.query.size_limit
.What solution would you like?
Since eval operator only adds or updates fields without any benefit on skipping or project data, and it could block other optimization, we should push down sort/limit/project operator through eval to evaluate eval's output as late as possible.
1. Limit
For limit, we could push down it under eval directly since it doesn't have any expressions.
2. Sort
For sort, for each sort field, we should analyze whether it's produced by eval.
2.1 All sort fields are not produced by eval, which means they can all be resolved by the Environment before eval, so we can push down it under eval directly
2.2 One or more fields are produced by eval, there are also some different cases we can dive deep whether replacement can work as semantic equally like
sort(a) - eval(a=b+1)
, or other cases it cannot likesort(a) - eval(a=b^2)
. And there may be cases sort field are calculated by more than 1 original fields. I think we should talk about this case in another issue for advance enhancement.2.2.1 the field is produced by equation to ReferenceExpression which is just simple rename like eval(sortField=originField), we can also push down sort by do some field replacement for these fields. Of course, there is some more complex cases like eval(newField=originField, sortField=newField), we should do replacement iteratively from right to left, or we just narrow down this case to not let it push down.
2.2.2 the field is produced by function, there are also some different cases we can dive deep whether replacement can work as semantic equally like
sort(a) - eval(a=b+1)
, or other cases it cannot likesort(a) - eval(a=b^2)
. And there may be cases sort field are calculated by more than 1 original fields. I think we should talk about this case in another issue for advance enhancement.3. Project
When we say pushing down a project under eval, it actually inserts another project before eval instead of pushing down the origin project itself.
We should consruct the new projectList by append all eval reference expressions to the original projectList, and kick off all reference expressions produced by eval. For example, if we have plan
project(a, b, d) - eval(b=c+2, a=b+1)
, we should optimize it toproject(a, b, c) - eval(a=b+1, b=c+2)) - project(c, d)
. Thus, we can do project in advance or push downproject(c, d)
into TableScan if possible, and ensure they are semantic equal.Sub-Tasks
What alternatives have you considered?
Do you have any additional context?
The text was updated successfully, but these errors were encountered: