Skip to content

Commit

Permalink
Added docs
Browse files Browse the repository at this point in the history
  • Loading branch information
vga91 committed Mar 18, 2024
1 parent 6790b9a commit 5579a3e
Show file tree
Hide file tree
Showing 7 changed files with 198 additions and 324 deletions.
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ subprojects {

ext {
// NB: due to version.json generation by parsing this file, the next line must not have any if/then/else logic
neo4jVersion = "5.19.0"
neo4jVersion = "5.18.0"
// instead we apply the override logic here
neo4jVersionEffective = project.hasProperty("neo4jVersionOverride") ? project.getProperty("neo4jVersionOverride") : neo4jVersion
testContainersVersion = '1.18.3'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@

= apoc.agg.multiStats
:description: This section contains reference documentation for the apoc.agg.multiStats function.

label:function[] label:apoc-extended[]

[.emphasis]
apoc.agg.multiStats(nodeOrRel, keys) - Return a multi-dimensional aggregation

== Signature

[source]
----
apoc.agg.multiStats(value :: NODE | RELATIONSHIP, keys :: LIST OF STRING) :: (MAP?)
----

== Input parameters
[.procedures, opts=header]
|===
| Name | Type | Default
|value|NODE \| RELATIONSHIP|null
|===


[[usage-apoc.data.email]]
== Usage Examples

Given this dataset:
[source,cypher]
----
CREATE (:Person { louvain: 596, neo4jImportId: "18349390", wcc: 48, lpa: 598, name: "aaa", another: 548}),
(:Person { louvain: 596, neo4jImportId: "18349390", wcc: 48, lpa: 598, name: "eee", another: 549}),
(:Person { louvain: 596, neo4jImportId: "18349390", wcc: 48, lpa: 598, name: "eee", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349391", wcc: 48, lpa: 598, name: "eee", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349392", wcc: 47, lpa: 596, name: "iii", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349393", wcc: 47, lpa: 596, name: "iii", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349394", wcc: 47, lpa: 596, name: "iii", another: 549}),
(:Person { louvain: 597, neo4jImportId: "18349393", wcc: 47, lpa: 596, name: "iii", another: 10}),
(:Person { louvain: 597, neo4jImportId: "18349394", wcc: 47, lpa: 596, name: "iii", another: 10})
----


We can create an optimized multiple aggregation based on the property key,
similar to this one:
[source,cypher]
----
MATCH (p:Person)
WITH p
CALL {
WITH p
MATCH (n:Person {louvain: p.louvain})
RETURN sum(p.louvain) AS sumLouvain, avg(p.louvain) AS avgLouvain, count(p.louvain) AS countLouvain
}
CALL {
WITH p
MATCH (n:Person {wcc: p.wcc})
RETURN sum(p.wcc) AS sumWcc, avg(p.wcc) AS avgWcc, count(p.wcc) AS countWcc
}
CALL {
WITH p
MATCH (n:Person {another: p.another})
RETURN sum(p.another) AS sumAnother, avg(p.another) AS avgAnother, count(p.another) AS countAnother
}
CALL {
WITH p
MATCH (lpa:Person {lpa: p.lpa})
RETURN sum(p.lpa) AS sumLpa, avg(p.lpa) AS avgLpa, count(p.lpa) AS countLpa
}
RETURN p.name,
sumLouvain, avgLouvain, countLouvain,
sumWcc, avgWcc, countWcc,
sumAnother, avgAnother, countAnother,
sumLpa, avgLpa, countLpa
----


executing the following query:
[source,cypher]
----
MATCH (p:Person)
RETURN apoc.agg.multiStats(p, ["lpa","wcc","louvain", "another"]) as output
----


.Results
[opts="header"]
|===
| output
a|
[source,json]
----
{
"louvain" :{"596" :{"avg" :596.0, "count" :3, "sum" :1788}, "597" :{"avg" :597.0, "count" :6, "sum" :3582}},
"wcc" :{"47" :{"avg" :47.0, "count" :5, "sum" :235}, "48" :{"avg" :48.0, "count" :4, "sum" :192}},
"another" :{"548" :{"avg" :548.0, "count" :1, "sum" :548}, "549" :{"avg" :549.0, "count" :6, "sum" :3294}, "10" :{"avg" :10.0, "count" :2, "sum" :20}},
"lpa" :{"596" :{"avg" :596.0, "count" :5, "sum" :2980}, "598" :{"avg" :598.0, "count" :4, "sum" :2392}}
}
----
|===

which can be used, for example, to return a result similar to the Cypher one in this way:

[source,cypher]
----
MATCH (p:Person)
WITH apoc.agg.multiStats(p, ["lpa","wcc","louvain", "another"]) as data
MATCH (p:Person)
RETURN p.name,
data.wcc[toString(p.wcc)].avg AS avgWcc,
data.louvain[toString(p.louvain)].avg AS avgLouvain,
data.lpa[toString(p.lpa)].avg AS avgLpa
----


.Results
[opts="header"]
|===
| avgWcc | avgLouvain | avgLpa
| 48.0 | 596.0 | 598.0
| 48.0 | 596.0 | 598.0
| 48.0 | 596.0 | 598.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
| 47.0 | 597.0 | 596.0
|===

6 changes: 6 additions & 0 deletions docs/asciidoc/modules/ROOT/pages/overview/apoc.agg/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,11 @@ Returns index of the `element` that match the given `value`

Returns index of the `element` that match the given `predicate`
|label:procedure[]


|xref::overview/apoc.agg/apoc.agg.multiStats.adoc[apoc.agg.multiStats icon:book[]]

apoc.agg.multiStats(nodeOrRel, keys) - Return a multi-dimensional aggregation
|label:function[]
|===

Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ Returns index of the `element` that match the given `predicate`
|===


[discrete]
== xref::overview/apoc.agg/index.adoc[]

[.procedures, opts=header, cols='5a,1a']
|===
| Qualified Name | Type
|xref::overview/apoc.agg/apoc.agg.multiStats.adoc[apoc.agg.multiStats icon:book[]]

apoc.agg.multiStats(nodeOrRel, keys) - Return a multi-dimensional aggregation
|label:procedure[]
|===


[discrete]
== xref::overview/apoc.bolt/index.adoc[]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ This file is generated by DocsTest, so don't change it!
** xref::overview/apoc.agg/index.adoc[]
*** xref::overview/apoc.agg/apoc.agg.row.adoc[]
*** xref::overview/apoc.agg/apoc.agg.position.adoc[]
*** xref::overview/apoc.agg/apoc.agg.multiStats.adoc[]
** xref::overview/apoc.bolt/index.adoc[]
*** xref::overview/apoc.bolt/apoc.bolt.execute.adoc[]
*** xref::overview/apoc.bolt/apoc.bolt.load.adoc[]
Expand Down
67 changes: 16 additions & 51 deletions extended/src/main/java/apoc/agg/MultiStats.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,56 +19,21 @@

@Extended
public class MultiStats {
/*
For a property you want to have more than one statistic:
size, count, avg, median, ...
which is probably already covered by apoc.agg.stats()
but then, how about multi-dimensional aggregation.
e.g.
apoc.agg.multiStats([key1,key2,key3]) -> Map<Key,Map<agg="sum,count,avg", number>>
e.g.
match (p:Person)
with apoc.agg.multiStats(p, ["wcc","lpa","louvain"]) as data
match (p:Person)
return p.name, data[toString(p.wcc)].count as size
see
https://community.neo4j.com/t/listing-the-community-size-of-different-community-detection-algorithms-already-calculated/42895/2?u=michael.hunger
*/

@UserAggregationFunction("apoc.agg.multiStats")
@Description("TODO...")
@Description("Return a multi-dimensional aggregation")
public MultiStatsFunction multiStats() {
return new MultiStatsFunction();
}

// todo --> sum,count,avg

public static class MultiStatsFunction {

// private Histogram values = new Histogram(3);
// private DoubleHistogram doubles;
// private List<Double> percentiles = asList(0.5D, 0.75D, 0.9D, 0.95D, 0.9D, 0.99D);
// private Number minValue;
private final Map<String, Map<String, Map<String, NumberValue>>> result = new HashMap<>();

// --> TODO - sum must be similar to https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-sum

@UserAggregationUpdate
public void aggregate(
@Name("value") Object value,
@Name(value = "keys") List<String> keys,
// todo...
@Name(value = "statistics", defaultValue = "['sum','count','avg']") List<String> statistics
) {
// todo - can be also a map, maybe?
@Name(value = "keys") List<String> keys) {
Entity entity = (Entity) value;

// for each prop
Expand All @@ -77,29 +42,29 @@ public void aggregate(
Object property = entity.getProperty(key);

result.compute(key, (ignored, v) -> {
Map<String, Map<String, NumberValue>> map1 = Objects.requireNonNullElseGet(v, HashMap::new);
Map<String, Map<String, NumberValue>> map = Objects.requireNonNullElseGet(v, HashMap::new);

// todo - it can be null?
map1.compute(property.toString(), (propKey, propVal) -> {
map.compute(property.toString(), (propKey, propVal) -> {

Map<String, NumberValue> map = Objects.requireNonNullElseGet(propVal, HashMap::new);
Map<String, NumberValue> propMap = Objects.requireNonNullElseGet(propVal, HashMap::new);

NumberValue count = map.compute("count", ((subKey, subVal) -> (NumberValue) ValueUtils.of(subVal == null ? 1 : subVal.longValue() + 1)) );
NumberValue count = propMap.compute("count",
((subKey, subVal) -> (NumberValue) ValueUtils.of(subVal == null ? 1 : subVal.longValue() + 1)) );

AnyValue of1 = ValueUtils.of(property);
AnyValue neo4jValue = ValueUtils.of(property);

if (of1 instanceof NumberValue of) {
NumberValue sum = map.compute("sum", ((subKey, subVal) -> subVal == null ? of : ValueMath.overflowSafeAdd(subVal, of)));

// NB: avg() return always a double
NumberValue avg = map.compute("avg", ((subKey, subVal) -> subVal == null ? of : sum.dividedBy(count.doubleValue()) ));
if (neo4jValue instanceof NumberValue numberValue) {
NumberValue sum = propMap.compute("sum",
((subKey, subVal) -> subVal == null ? numberValue : ValueMath.overflowSafeAdd(subVal, numberValue)));

propMap.compute("avg",
((subKey, subVal) -> subVal == null ? ValueUtils.asDoubleValue(numberValue.doubleValue()) : sum.dividedBy(count.doubleValue()) ));
}

return map;
return propMap;
});


return map1;
return map;
});
}
});
Expand Down
Loading

0 comments on commit 5579a3e

Please sign in to comment.