Skip to content

Latest commit

 

History

History
130 lines (89 loc) · 4.58 KB

aggregate_awareness_and_the_join.md

File metadata and controls

130 lines (89 loc) · 4.58 KB

Aggregate Awareness and the Join

Motivation

Given the example with the dynamic top n we have identified the need for a join to be calculated dynamically based on user selection of dimensions and measures. There can be many other cases where we wanted to inject some flexibility into a window function or other derived tables.

Such behavior currently does not go well together with aggregate awareness, because aggregate awareness assumes that all joins and dimensions are preaggregatable (or it denies them to be references in aggregate awareness).

That means that any dynamic behavior of lets say SQL derived tables is frozen or the aggregate awareness cannot be applied.

Attempted Workaround

The first attempt to work around this issue was to look for a possible pattern to control on which aggregate awareness is activated.

The below figure should illustrate the setup of starting with an autogenerated view. Then, we add the standard layer defining all the relevant (preaggregatable) measures but skipping the definition of parameters, filters and some measures containing window functions.

The next step was to create an aggregate aware base explore with all the definitions for aggregate awareness. In the next step a native derived table - again will all columns and measures - is defined. In this view derived columns, parameters and filters could hypothetically be defined. Based on this native derived table a new explore is created that now contains the joins that should be applied post aggregate awareness. All queries should be directed to this explore.

img

Unfortunately though the behavior of the NDT in conjunction with aggregate awareness was a little bit unexpected.

While the aggregate awareness is triggered as expected when the aggregate aware base explore is queried as the below screenshot indicates:

img

The same was no longer true for the explore based on the NDT.

img

The key observation we can make is that the NDT is generating a group by query instead of a simple select query of only the required columns and dimensions that are requested.

Possible Hypothetical Solutions

(1) Join Control for Aggregate Awareness

It would be nice to be in control if aggregate awareness is applied before or after the join.

Some hypothetical parameter could be something like aggregate_aware: no indicating that this join is only calculatable after the aggregate awareness.

join: sqldt_dynamic_sum {
  type: left_outer
  relationship: many_to_one
  aggregate_aware: post
  ...
}

While this was my first idea I can already think of some complications.

(2) Native Derived table withouth GroupBy

The second solution I could think of would be be to allow native derived tables to be simple select statements only containing the columns that are required for the current request. This would leave optimization to the underlying explore.

This could be implemented with a new preaggregate parameter.

view: ndt_income {

  derived_table: {
    explore_source: base_income {
      column: id {}
      column: name {}
      column: age {}
      column: country {}
      column: gender {}
      column: total_income {}
    }
    preaggregate: no
  ...
  }

Given the running exaple, the expected output would be

SELECT
    aggregate_aware_base.category AS category,
    aggregate_aware_base.product AS product,
    aggregate_aware_base.sales_date AS sales_date,
    aggregate_aware_base.territory AS territory,
    aggregate_aware_base.sales_value AS total_sales_value
FROM product_sales AS aggregate_aware_base

Instead of the current output

SELECT
    aggregate_aware_base.category  AS category,
    aggregate_aware_base.product  AS product,
        (TIMESTAMP(aggregate_aware_base.sales_date )) AS sales_date,
    aggregate_aware_base.territory  AS territory,
    COALESCE(SUM(aggregate_aware_base.sales_value ), 0) AS total_sales_value
FROM product_sales AS aggregate_aware_base
GROUP BY
    1,
    2,
    3,
    4

(3) SQL Derived table based on explores

Allowing to reference an explore in SQL, inside of a SQL derived table, similar to how it is currently allowed for views: ${explore_name.SQL_TABLE_NAME}.

This is probably the simplest to implement from a Looker perspective while allowing all the flexilibilty.