Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 2182: multi-dimensional aggregation + aggregation functions for community sizes etc #417

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ subprojects {

ext {
// NB: due to version.json generation by parsing this file, the next line must not have any if/then/else logic
neo4jVersion = "5.19.0"
neo4jVersion = "5.18.0"
// instead we apply the override logic here
neo4jVersionEffective = project.hasProperty("neo4jVersionOverride") ? project.getProperty("neo4jVersionOverride") : neo4jVersion
testContainersVersion = '1.18.3'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@

= apoc.agg.multiStats
:description: This section contains reference documentation for the apoc.agg.multiStats function.

label:function[] label:apoc-extended[]

[.emphasis]
apoc.agg.multiStats(nodeOrRel, keys) - Return a multi-dimensional aggregation

== Signature

[source]
----
apoc.agg.multiStats(value :: NODE | RELATIONSHIP, keys :: LIST OF STRING) :: (MAP?)
----

== Input parameters
[.procedures, opts=header]
|===
| Name | Type | Default
|value|NODE \| RELATIONSHIP|null
|===


[[usage-apoc.data.email]]
== Usage Examples

Given this dataset:
[source,cypher]
----
CREATE (:Person { louvain: 596, neo4jImportId: "18349390", wcc: 48, lpa: 598, name: "aaa", another: 548}),
(:Person { louvain: 596, neo4jImportId: "18349390", wcc: 48, lpa: 598, name: "eee", another: 549}),
(:Person { louvain: 596, neo4jImportId: "18349390", wcc: 48, lpa: 598, name: "eee", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349391", wcc: 48, lpa: 598, name: "eee", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349392", wcc: 47, lpa: 596, name: "iii", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349393", wcc: 47, lpa: 596, name: "iii", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349394", wcc: 47, lpa: 596, name: "iii", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349393", wcc: 47, lpa: 596, name: "iii", another: 10}),
(:Person { louvain: 597, neo4jImportId: "18349394", wcc: 47, lpa: 596, name: "iii", another: 10})
----


We can create an optimized multiple aggregation based on the property key,
similar to this one:
[source,cypher]
----
MATCH (p:Person)
WITH p
CALL {
WITH p
MATCH (n:Person {louvain: p.louvain})
RETURN sum(p.louvain) AS sumLouvain, avg(p.louvain) AS avgLouvain, count(p.louvain) AS countLouvain
}
CALL {
WITH p
MATCH (n:Person {wcc: p.wcc})
RETURN sum(p.wcc) AS sumWcc, avg(p.wcc) AS avgWcc, count(p.wcc) AS countWcc
}
CALL {
WITH p
MATCH (n:Person {another: p.another})
RETURN sum(p.another) AS sumAnother, avg(p.another) AS avgAnother, count(p.another) AS countAnother
}
CALL {
WITH p
MATCH (lpa:Person {lpa: p.lpa})
RETURN sum(p.lpa) AS sumLpa, avg(p.lpa) AS avgLpa, count(p.lpa) AS countLpa
}
RETURN p.name,
sumLouvain, avgLouvain, countLouvain,
sumWcc, avgWcc, countWcc,
sumAnother, avgAnother, countAnother,
sumLpa, avgLpa, countLpa
----


executing the following query:
[source,cypher]
----
MATCH (p:Person)
RETURN apoc.agg.multiStats(p, ["lpa","wcc","louvain", "another"]) as output
----


.Results
[opts="header"]
|===
| output
a|
[source,json]
----
{
"louvain" :{"596" :{"avg" :596.0, "count" :3, "sum" :1788}, "597" :{"avg" :597.0, "count" :6, "sum" :3582}},
"wcc" :{"47" :{"avg" :47.0, "count" :5, "sum" :235}, "48" :{"avg" :48.0, "count" :4, "sum" :192}},
"another" :{"548" :{"avg" :548.0, "count" :1, "sum" :548}, "549" :{"avg" :549.0, "count" :6, "sum" :3294}, "10" :{"avg" :10.0, "count" :2, "sum" :20}},
"lpa" :{"596" :{"avg" :596.0, "count" :5, "sum" :2980}, "598" :{"avg" :598.0, "count" :4, "sum" :2392}}
}
----
|===

which can be used, for example, to return a result similar to the Cypher one in this way:

[source,cypher]
----
MATCH (p:Person)
WITH apoc.agg.multiStats(p, ["lpa","wcc","louvain", "another"]) as data
MATCH (p:Person)
RETURN p.name,
data.wcc[toString(p.wcc)].avg AS avgWcc,
data.louvain[toString(p.louvain)].avg AS avgLouvain,
data.lpa[toString(p.lpa)].avg AS avgLpa
----


.Results
[opts="header"]
|===
| avgWcc | avgLouvain | avgLpa
| 48.0 | 596.0 | 598.0
| 48.0 | 596.0 | 598.0
| 48.0 | 596.0 | 598.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
|===

6 changes: 6 additions & 0 deletions docs/asciidoc/modules/ROOT/pages/overview/apoc.agg/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,11 @@ Returns index of the `element` that match the given `value`

Returns index of the `element` that match the given `predicate`
|label:procedure[]


|xref::overview/apoc.agg/apoc.agg.multiStats.adoc[apoc.agg.multiStats icon:book[]]

apoc.agg.multiStats(nodeOrRel, keys) - Return a multi-dimensional aggregation
|label:function[]
|===

Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ Returns index of the `element` that match the given `predicate`
|===


[discrete]
== xref::overview/apoc.agg/index.adoc[]

[.procedures, opts=header, cols='5a,1a']
|===
| Qualified Name | Type
|xref::overview/apoc.agg/apoc.agg.multiStats.adoc[apoc.agg.multiStats icon:book[]]

apoc.agg.multiStats(nodeOrRel, keys) - Return a multi-dimensional aggregation
|label:procedure[]
|===


[discrete]
== xref::overview/apoc.bolt/index.adoc[]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ This file is generated by DocsTest, so don't change it!
** xref::overview/apoc.agg/index.adoc[]
*** xref::overview/apoc.agg/apoc.agg.row.adoc[]
*** xref::overview/apoc.agg/apoc.agg.position.adoc[]
*** xref::overview/apoc.agg/apoc.agg.multiStats.adoc[]
** xref::overview/apoc.bolt/index.adoc[]
*** xref::overview/apoc.bolt/apoc.bolt.execute.adoc[]
*** xref::overview/apoc.bolt/apoc.bolt.load.adoc[]
Expand Down
148 changes: 148 additions & 0 deletions extended/src/main/java/apoc/agg/MultiStats.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
package apoc.agg;

import apoc.Extended;
import org.jetbrains.annotations.NotNull;
import org.neo4j.graphdb.Entity;
import org.neo4j.kernel.impl.util.ValueUtils;
import org.neo4j.procedure.Description;
import org.neo4j.procedure.Name;
import org.neo4j.procedure.UserAggregationFunction;
import org.neo4j.procedure.UserAggregationResult;
import org.neo4j.procedure.UserAggregationUpdate;
import org.neo4j.values.AnyValue;
import org.neo4j.values.storable.NumberValue;
import org.neo4j.values.utils.ValueMath;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;

@Extended
public class MultiStats {


@UserAggregationFunction("apoc.agg.rollup")
@Description("Return a multi-dimensional aggregation")
public RollupFunction rollup() {
return new RollupFunction();
}

public static class RollupFunction {
private static final String NULL_ROLLUP = "NULL";
private final Map<String, Object> result = new HashMap<>();
// private final Map<String, Map<String, Map<String, NumberValue>>> result = new HashMap<>();

@UserAggregationUpdate
public void aggregate(
@Name("value") Object value,
@Name(value = "groupKeys") List<String> groupKeys,
@Name(value = "aggKeys") List<String> aggKeys) {
Entity entity = (Entity) value;

if (groupKeys.isEmpty()) {
return;
}

if (entity.hasProperty(groupKeys.get(0))) {
return;
}

result.compute(groupKeys.get(0), (i, v) -> {
result.compute(groupKeys.get(1), (i2, v2) -> {

});
});


// result.compute(NULL_ROLLUP, ()


// primo compute
// inner compute
//
// secondo compute
// terzo compute
// `NULL`
}
}


/*
mysql> SELECT SupplierID, CategoryID, sum(Price), avg(Price), Unit FROM Products GROUP BY SupplierID, CategoryID WITH ROLLUP;
ERROR 1055 (42000): Expression #5 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'Northwind.Products.Unit' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
mysql> SELECT SupplierID, CategoryID, sum(Price), avg(Price) FROM Products GROUP BY SupplierID, CategoryID WITH ROLLUP;
+------------+------------+------------+------------+
| SupplierID | CategoryID | sum(Price) | avg(Price) |
+------------+------------+------------+------------+
| 1 | 1 | 37 | 18.5000 |
| 1 | 2 | 10 | 10.0000 |
| 1 | NULL | 47 | 15.6667 |
| 2 | 2 | 81 | 20.2500 |
| 2 | NULL | 81 | 20.2500 |
*/


@UserAggregationFunction("apoc.agg.multiStats")
@Description("Return a multi-dimensional aggregation")
public MultiStatsFunction multiStats() {
return new MultiStatsFunction();
}

public static class MultiStatsFunction {

private final Map<String, Map<String, Map<String, NumberValue>>> result = new HashMap<>();

@UserAggregationUpdate
public void aggregate(
@Name("value") Object value,
@Name(value = "keys") List<String> keys) {
Entity entity = (Entity) value;

// for each prop
keys.forEach(key -> {
if (entity.hasProperty(key)) {
Object property = entity.getProperty(key);

result.compute(key, (ignored, v) -> {
Map<String, Map<String, NumberValue>> map = Objects.requireNonNullElseGet(v, HashMap::new);

map.compute(property.toString(), (propKey, propVal) -> {

return getStringNumberValueMap(property, propVal);
});

return map;
});
}
});
}


@UserAggregationResult
// apoc.agg.multiStats([key1,key2,key3]) -> Map<Key,Map<agg="sum,count,avg", number>>
public Map<String, Map<String, Map<String, NumberValue>>> result() {
return result;
}
}


private static Map<String, NumberValue> getStringNumberValueMap(Object property, Map<String, NumberValue> propVal) {
Map<String, NumberValue> propMap = Objects.requireNonNullElseGet(propVal, HashMap::new);

NumberValue count = propMap.compute("count",
((subKey, subVal) -> (NumberValue) ValueUtils.of(subVal == null ? 1 : subVal.longValue() + 1)) );

AnyValue neo4jValue = ValueUtils.of(property);

if (neo4jValue instanceof NumberValue numberValue) {
NumberValue sum = propMap.compute("sum",
((subKey, subVal) -> subVal == null ? numberValue : ValueMath.overflowSafeAdd(subVal, numberValue)));

propMap.compute("avg",
((subKey, subVal) -> subVal == null ? ValueUtils.asDoubleValue(numberValue.doubleValue()) : sum.dividedBy(count.doubleValue()) ));
}

return propMap;
}
}
Loading
Loading