1.8.0b1: Expanded Materialized View and Streaming Table support #595
Replies: 3 comments 7 replies
-
@benc-db , thanks for releasing this beta. We are experimenting with integrating MV/ST into our DBT project at the moment so the timing is perfect for us. I installed 1.8.0b1 and found a couple of things I wanted to share with you:
|
Beta Was this translation helpful? Give feedback.
-
Here is my terminal output and the logs from that same run (slightly redacted). I think the issue is on line 109. If I run "select * from I tested it with both serverless and SQL Classic. |
Beta Was this translation helpful? Give feedback.
-
O, ok. So the filter list is meant for change detection, not as a filter upon MV creation. In that case it works as expected. |
Beta Was this translation helpful? Give feedback.
-
Today I'll be releasing 1.8.0b1 to get feedback on expanded support for materialized views and streaming tables. This post includes my preliminary docs, which I'll gather feedback on before publishing to the dbt doc site when we release 1.8.0 (most likely in April).
This update brings two heavily-requested features:
Configuration Options
Support summarized as follows
Config Details
partition_by
Works the same as for views and tables, i.e. can be a single column, or an array.
description
As with views and tables, adding a description to your schema.yml file will lead to a comment added to your MV or ST.
tblproperties
Works as with views and tables with an important exception: we maintain a list of keys that are set by Databricks when making an MV or ST that are ignored for the purpose of determining configuration changes. I know this approach is fragile and is likely to change in the future. Take a look at
tblproperties.py
for the current ignore list.schedule
Use this to set the refresh schedule for the model. If you use the
schedule
key, acron
key is required in the associated dictionary, but time_zone_value is optional (see the example above). Thecron
value should be formatted as documented by Databricks. If an MV/ST has a schedule set, and your dbt project does not specify a schedule for it, the refresh schedule will be set to manual when you next run the project. Even if a schedule is set, dbt will ask for the materialization to refresh manually when run.query
For materialized views, if the compiled query for the model differs from the query in the database, we will the take the configured on_configuration_change action. Changes to query are not currently detectable for streaming tables; see the next section for details.
What's left?
Currently we do not support specifying column level documentation / types for either materialization. Hopefully we can bring this support later this year.
on_configuration_change
on_configuration_change
is supported for materialized views and streaming tables, though the two materializations handle it different ways.Materialized Views
Currently, the only change that can be applied without recreating the materialized view in Databricks is to update the schedule. Some config is unlikely to ever be capable of changing without recreating the view, e.g. partitioned by; we can expand support to include updating the description and tblproperties without recreating the view if/when the SQL API for altering those becomes available.
Streaming Tables
For streaming tables, only changes to the partitioning currently requires a drop and create. For any other config changes, we will use
create or refresh
(+ an alter statement for changes to the schedule) to make the changes. There is currently no mechanism for the dbt adapter to detect if the streaming table query has changed, so in this case, regardless of the behavior requested by on_configuration_change, we will use acreate or refresh
statement (assumingpartitioned by
hasn't changed), which will cause the query applied to future rows to change, without rerunning on any previously processed rows. If your source data is still available, running with '--full-refresh' should reprocess the available data with the current query.Beta Was this translation helpful? Give feedback.
All reactions