Skip to content

2009 12 23 any all and aggregate

Fabian Schmied edited this page Dec 23, 2009 · 1 revision

Published on December 23rd, 2009 at 15:15

Any, All, and Aggregate

For the last few weeks, I’ve been very occupied with re-store, the O/R mapping part of re-motion. Still, there was some time left for three minor re-linq features Steve Strong asked me to implement: support for All, Any, and Aggregate result operators.

Any can be used to check whether a sequence (or query result) contains any elements satisfying a given predicate, or whether it contains any elements at all:

// got any students called "Garcia"?
var result2 = students.Any (s => s.Last == "Garcia");

// got any students at all?
var result1 = students.Any ();

Both forms of Any are represented by re-linq via instances of AnyResultOperator. Remember: result operators are query operators that act on the whole result set of a query, calculating a single value from the result set or transforming it into a completely different set of items. In this case, a boolean value is calculated from the result set.

As all result operators, instances of AnyResultOperator can be analyzed by checking QueryModel.ResultOperators or handled in a visitor by implementing or overriding the methods IQueryModelVisitor.VisitResultOperator and QueryModelVisitorBase.VisitResultOperator.

You may notice that the predicate that can be passed to Any has the same semantics as a where claus; i.e., the following two queries are semantically equivalent:

students.Any (s => s.Last == "Garcia");
students.Where (s => s.Last == "Garcia").Any ();

This is similar to other result operators taking optional predicates, such as First or Count, and as with those result operators, re-linq represents both forms using a WhereClause. This means that the AnyResultOperator will never hold a predicate; the respective WhereClause can be found in the BodyClauses of the QueryModel holding the result operator.

All can be used to check whether all elements in a sequence satisfy a given predicate:

// are all students named "Garcia"?
students.All (s => s.Last == "Garcia");

In this case, the predicate is not optional, and it also doesn’t denote a filter with Where semantics. Therefore, the corresponding AllResultOperator created by re-linq always holds the predicate – in its resolved form. “Resolved” means that the LambdaExpression has been simplified by substituting the parameter representing the incoming items (s in above sample) with an expression describing those items.

In the example, AllResultOperator.Predicate will hold the following: [s].Last == “Garcia”. The [s] part is a QuerySourceReferenceExpression that points to the MainFromClause representing the students query source. re-linq always resolves LambdaExpressions that way, so this is the same expression you would see in a SelectClause or WhereClause.

Aggregate is also a result operator; it can be used to accumulate (or … aggregate) all the incoming items into a single value:

students  
   .Select (s => s.Kids.Count)  
   .Aggregate ((total, kidCount) => total + kidCount);

In this example, the aggregate operator combines all counts by adding them together, similar as if I used Sum instead. But Aggregate is not restricted to additions:

students
  .Aggregate ("", (nameString, s) => nameString + " " + s.Last);

This example concatenates all last names into a single name string. It uses a different overload of Aggregate, one that takes an initial seed value- This overload also allows the aggregated value to be of a different type than the incoming items, so no select clause is needed.

Note that this overload has quite different semantics than the one I used before; compare the following two queries:

students  
  .Select (s => s.Last)  
  .Aggregate ((nameString, last) => nameString + " " + last);

students  
  .Aggregate ("", (nameString, s) => nameString + " " + s.Last);

Apart from the fact that I needed a select clause for the first query, but not for the second one, the result of the second query will include a leading space, whereas the result of the first query won’t. The first result does not hold a leading space because that Aggregate overload uses the first incoming item as the seed value, and the aggregating function is only called for the remaining items. For the second query, the seed is given, so the aggregating function is called for each of the items – including the first one. And this causes one space per item to included in the second result.

Because of this semantic discrepancy, I decided to implement two different result operator classes within re-linq: AggregateResultOperator and AggregateFromSeedResultOperator; the former representing the first overload, the latter the second one. There’s also an additional third overload which takes an additional result selector – a LambdaExpression transforming the aggregated value one last time before it is returned –, but this is semantically identical with the second overload, so it is also represented by the AggregateFromSeedResultOperator (which therefore offers an OptionalResultSelector property).

Both AggregateResultOperator and AggregateFromSeedResultOperator hold the aggregating function in resolved form, i.e., for the first Aggregate example above, the result operator would hold the following expression: total => total + [s].Kids.Count, with [s] being the reference expression that points back to the MainFromClause representing the students query source.

Note that most LINQ providers will probably have difficulties to support Aggregate in its entirety, simply because it’s so flexible. You can use Aggregate to do sums, products, divisions, string concatenations, list building, and much, much, more. But for some scenarios, it will still be handy to have support for it in re-linq – and like all result operators, AggregateResultOperator and AggregateFromSeedResultOperator both have an ExecuteInMemory method to run the operator on an in-memory sequence if desired.

Clone this wiki locally