Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5 #4232

daltontc · 2022-10-12T16:52:33Z

Current Behavior

In versions 3.7.4 and 4.1.0, iterating over a TupleQueryResult with large results is extremely slow when compared to older RDF4j versions (2.5.5).

With newer RDF4j versions, running the following query over the CHEBI Lite ontology stored in a graph takes over an hour to process results whereas in RDF4j 2.5.5 it only takes 4s.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    ?parent rdf:type owl:Class .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type owl:Class .
    }
}

See https://github.com/daltontc/rdf4jTest/tree/bug/tuplequeryresults_slow_iteration for an example

Expected Behavior

The time it takes to iterate over large query results should be relatively consistent between versions, if not better in later versions.

Steps To Reproduce

Clone https://github.com/daltontc/rdf4jTest/tree/bug/tuplequeryresults_slow_iteration (requires Git LFS due to large trig file)
Run Main with 2.5.5
Results are iterated over in 4s

Total query time: 4228
Parent Count: 302455
Child Count: 122258

Switch version to 3.7.4 or 4.1.0
Run Main
Results take ages to completely iterate over

Version

3.7.4, 4.1.0

Are you interested in contributing a solution yourself?

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

kenwenzel · 2022-10-13T07:12:13Z

Can you try to use the query explanation feature:
https://rdf4j.org/documentation/programming/repository/#explaining-queries

The long runtime could be related to sub-optimal query optimization.

daltontc · 2022-10-13T17:20:40Z

I let the query actually finish out with 3.7.4 and it took nearly 24 hours. Crazy that the query used to run efficiently in 4 seconds but now takes nearly a full day.

Total query time: 83592008
Parent Count: 302455
Child Count: 122258

Here is the explanation:

Timed out while retrieving explanation! Explanation may be incomplete!
You can change the timeout by setting .setMaxExecutionTime(...) on your query.

Projection (resultSizeActual=38.2K, totalTimeActual=60.0s, selfTimeActual=12.2ms)
╠══ProjectionElemList
║     ProjectionElem "parent"
║     ProjectionElem "child"
╚══LeftJoin (LeftJoinIterator) (resultSizeActual=38.2K, totalTimeActual=59.9s, selfTimeActual=15.3ms)
   ├──Join (JoinIterator) (resultSizeActual=91, totalTimeActual=39.3ms, selfTimeActual=1.82ms)
   │  ╠══BindingSetAssignment ([[type=http://www.w3.org/2002/07/owl#Class], [type=http://www.w3.org/2000/01/rdf-schema#Class]]) (costEstimate=0, resultSizeEstimate=1, resultSizeActual=1, totalTimeActual=0.014ms, selfTimeActual=0.014ms)
   │  ╚══StatementPattern (costEstimate=430, resultSizeEstimate=185.1K, resultSizeActual=91, totalTimeActual=37.4ms, selfTimeActual=37.4ms)
   │        Var (name=parent)
   │        Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
   │        Var (name=type)
   └──Join (JoinIterator) (resultSizeActual=38.2K, totalTimeActual=59.9s, selfTimeActual=10.6s)
      ╠══StatementPattern (costEstimate=215, resultSizeEstimate=185.1K, resultSizeActual=12.1M, totalTimeActual=25.9s, selfTimeActual=25.9s)
      ║     Var (name=child)
      ║     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
      ║     Var (name=type)
      ╚══StatementPattern (costEstimate=1, resultSizeEstimate=191.4K, resultSizeActual=38.2K, totalTimeActual=23.3s, selfTimeActual=23.3s)
            Var (name=child)
            Var (name=_const_4592be07_uri, value=http://www.w3.org/2000/01/rdf-schema#subClassOf, anonymous)
            Var (name=parent)

If I switch the query to remove one of the values in the VALUES declaration, it runs just as slow:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    values ?type {owl:Class}
    ?parent rdf:type ?type .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type ?type .
    }
}

But by removing the VALUES declaration and subbing in just owl:Class it runs in 4s again.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    ?parent rdf:type owl:Class .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type owl:Class .
    }
}

Total query time: 4150
Parent Count: 302455
Child Count: 122258

Reworking the query to UNION both different VALUES has it run in the original 4s

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    {
        ?parent rdf:type owl:Class .
        optional {
            ?child rdfs:subClassOf ?parent ;
                   rdf:type owl:Class .
        }
    }
    UNION
    {
        ?parent rdf:type rdfs:Class .
        optional {
            ?child rdfs:subClassOf ?parent ;
                   rdf:type rdfs:Class .
        }
    }
}

Total query time: 4332
Parent Count: 302455
Child Count: 122258

It appears that there is an issue with VALUES in the query evaluation.

hmottestad · 2022-10-14T09:07:04Z

Is it correct that the query you are having performance issues with is using a VALUES clause, but if you remove the VALUES clause then there is no performance difference?

daltontc · 2022-10-14T14:20:46Z

@hmottestad I would say that is accurate. These two essentially equivalent queries run in 23 hours and 4 seconds respectively.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    values ?type {owl:Class}
    ?parent rdf:type ?type .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type ?type .
    }
}

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>

select ?parent ?child
where {
    ?parent rdf:type owl:Class .
    optional {
        ?child rdfs:subClassOf ?parent ;
               rdf:type owl:Class .
    }
}

hmottestad · 2022-10-15T14:41:07Z

Probably related to:

This is the generically correct way to parse VALUES clauses. An optimizer can potentially look at the ordering in the algebra to push the values clause down into the join tree (by inspecting which parts of the tree have variables bound in the VALUES clause).

abrokenjester · 2022-11-20T04:55:30Z

I just spent a fruitless two hours on this, tricky to see where the problem is. TBC.

This is the generically correct way to parse VALUES clauses. An optimizer can potentially look at the ordering in the algebra to push the values clause down into the join tree (by inspecting which parts of the tree have variables bound in the VALUES clause).

abrokenjester · 2022-12-18T01:41:21Z

I'm trying to construct a benchmark that reproduces the performance difference, using a synthetic dataset (that hopefully is a little easier to work with than the full dataset provided by @daltontc ). Struggling to show any significant performance difference though. Will try and look closer at the shape of the dataset. @daltontc any insights from your end on how your data is structured? Ideally I'd like to generate a dataset that we can use to reproduce the performance issue, but not quite at the scale of your original set.

This benchmark uses generated data conforming to the query pattern, and executes performance tests on both the variant with a VALUES clause, and (as a baseline) the simple equivalent query. Unfortunately, sofar I have been unable to reproduce any significant performance difference.

hmottestad · 2022-12-18T08:51:33Z

There is a trig file in the repo that @daltontc linked too. It reproduces the issue for me.

I think it's an issue with the join order from the query planner. I'll explain more in a bit, will also hopefully push a commit to your branch.

hmottestad · 2022-12-18T09:09:22Z

Without VALUES clause

Projection (resultSizeActual=505)
╠══ProjectionElemList
║     ProjectionElem "parent"
║     ProjectionElem "child"
╚══LeftJoin (LeftJoinIterator) (resultSizeActual=505)
   ├──StatementPattern (resultSizeEstimate=504, resultSizeActual=504)
   │     Var (name=parent)
   │     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
   │     Var (name=_const_7e619dcc_uri, value=http://www.w3.org/2002/07/owl#Class, anonymous)
   └──Join (JoinIterator) (resultSizeActual=2)
      ╠══StatementPattern (costEstimate=50, resultSizeEstimate=9.9K, resultSizeActual=2)
      ║     Var (name=child)
      ║     Var (name=_const_4592be07_uri, value=http://www.w3.org/2000/01/rdf-schema#subClassOf, anonymous)
      ║     Var (name=parent)
      ╚══StatementPattern (costEstimate=1, resultSizeEstimate=504, resultSizeActual=2)
-->         Var (name=child) <-- The ?child variable won't be bound until the "?child rdfs:subClassOf ?parent" statement is evaluated
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_7e619dcc_uri, value=http://www.w3.org/2002/07/owl#Class, anonymous)

With VALUES clause

Projection
╠══ProjectionElemList
║     ProjectionElem "parent"
║     ProjectionElem "child"
╚══LeftJoin (LeftJoinIterator)
   ├──Join (JoinIterator)
   │  ╠══BindingSetAssignment ([[type=http://www.w3.org/2002/07/owl#Class]]) (costEstimate=1, resultSizeEstimate=1)
   │  ╚══StatementPattern (costEstimate=504, resultSizeEstimate=504)
   │        Var (name=parent)
   │        Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
-->│        Var (name=type, value=http://www.w3.org/2002/07/owl#Class) <--- ?type is able to bind to the ?type below
   └──Join (JoinIterator)
      ╠══StatementPattern (costEstimate=11, resultSizeEstimate=508)
      ║     Var (name=child)
      ║     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
-->   ║     Var (name=type) <--- This ?type!
      ╚══StatementPattern (costEstimate=1, resultSizeEstimate=9.9K)
            Var (name=child)
            Var (name=_const_4592be07_uri, value=http://www.w3.org/2000/01/rdf-schema#subClassOf, anonymous)
            Var (name=parent)

Notice that the plans swap the bottom two joins. This is because I've "helped" the join optimizer by adding a lot of [] rdfs:subClassOf [] statements. I've also added 500 [] a owl:Class statements to add some cost to the ?child a ?type query line.

The join optimizer swaps the two last joins because ?type can be bound to the outer join in the query using VALUES.

See the inline comments marked with -->

hmottestad · 2022-12-18T09:51:35Z

@jeenbroekstra I'm not going to dig any further right now. If you don't have time yourself then maybe I'll pick it up in a few days.

abrokenjester · 2022-12-18T21:06:23Z

@hmottestad nice work, thanks! I'll try and pick it up again later this week, but feel free to also dig in if you have the time :)

daltontc added the 🐞 bug issue is a bug label Oct 12, 2022

barthanssens added this to the 4.2.1 milestone Oct 17, 2022

hmottestad changed the title ~~Extreme Performance Degradation in Large TupleQueryResult Iteration in 3.7.4/4.1.0 vs 2.5.5~~ Performance regression for queries using OPTIONAL ans VALUES in 3.7.4/4.1.0 vs 2.5.5 Oct 18, 2022

hmottestad changed the title ~~Performance regression for queries using OPTIONAL ans VALUES in 3.7.4/4.1.0 vs 2.5.5~~ Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5 Oct 18, 2022

hmottestad modified the milestones: 4.2.1, 4.2.2 Nov 8, 2022

abrokenjester self-assigned this Nov 19, 2022

hmottestad modified the milestones: 4.2.2, 4.2.3 Dec 1, 2022

abrokenjester linked a pull request Dec 18, 2022 that will close this issue

GH-4232 performance issue in VALUES clause query #4330

Draft

5 tasks

hmottestad added a commit that referenced this issue Dec 18, 2022

GH-4232 updated benchmark to reproduce the issue

60e2402

abrokenjester assigned hmottestad Dec 18, 2022

hmottestad modified the milestones: 4.2.5, 4.3.2, 4.3.3 Jun 1, 2023

hmottestad modified the milestones: 4.3.3, 4.3.4 Jul 5, 2023

hmottestad modified the milestones: 4.3.4, 4.3.5 Jul 22, 2023

hmottestad modified the milestones: 4.3.5, 4.3.6 Aug 6, 2023

hmottestad modified the milestones: 4.3.6, 4.3.7 Aug 27, 2023

hmottestad modified the milestones: 4.3.7, 4.3.8 Oct 6, 2023

hmottestad modified the milestones: 4.3.8, 4.3.9 Nov 7, 2023

hmottestad modified the milestones: 4.3.9, 4.3.10 Jan 21, 2024

hmottestad modified the milestones: 4.3.10, 4.3.11 Mar 6, 2024

hmottestad modified the milestones: 4.3.11, 4.3.12 Apr 1, 2024

hmottestad modified the milestones: 4.3.12, 4.3.13 Jun 4, 2024

hmottestad modified the milestones: 4.3.13, 5.0.2 Jul 24, 2024

hmottestad modified the milestones: 5.0.2, 5.0.3 Aug 2, 2024

hmottestad modified the milestones: 5.0.3, 5.0.4 Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5 #4232

Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5 #4232

daltontc commented Oct 12, 2022

kenwenzel commented Oct 13, 2022

daltontc commented Oct 13, 2022 •

edited

Loading

hmottestad commented Oct 14, 2022

daltontc commented Oct 14, 2022

hmottestad commented Oct 15, 2022

abrokenjester commented Nov 20, 2022

abrokenjester commented Dec 18, 2022

hmottestad commented Dec 18, 2022

hmottestad commented Dec 18, 2022 •

edited

Loading

hmottestad commented Dec 18, 2022

abrokenjester commented Dec 18, 2022

Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5 #4232

Performance regression for queries using OPTIONAL and VALUES in 3.7.4/4.1.0 vs 2.5.5 #4232

Comments

daltontc commented Oct 12, 2022

Current Behavior

Expected Behavior

Steps To Reproduce

Version

Are you interested in contributing a solution yourself?

Anything else?

kenwenzel commented Oct 13, 2022

daltontc commented Oct 13, 2022 • edited Loading

hmottestad commented Oct 14, 2022

daltontc commented Oct 14, 2022

hmottestad commented Oct 15, 2022

abrokenjester commented Nov 20, 2022

abrokenjester commented Dec 18, 2022

hmottestad commented Dec 18, 2022

hmottestad commented Dec 18, 2022 • edited Loading

Without VALUES clause

With VALUES clause

hmottestad commented Dec 18, 2022

abrokenjester commented Dec 18, 2022

daltontc commented Oct 13, 2022 •

edited

Loading

hmottestad commented Dec 18, 2022 •

edited

Loading