Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persist_docs for MV/ST #793

Open
Amin-Siddique opened this issue Sep 16, 2024 · 1 comment
Open

Add persist_docs for MV/ST #793

Amin-Siddique opened this issue Sep 16, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Amin-Siddique
Copy link

Amin-Siddique commented Sep 16, 2024

Describe the feature

The feature adds the ability to automatically generate SQL column comments in Databricks Delta Live Tables (DLT). Specifically, it updates the CREATE STREAMING TABLE SQL generation to include column comments using the COMMENT clause, which can be pulled from the dbt column description metadata.

This allows users to define meaningful metadata for their columns directly in dbt models and have those automatically applied as SQL column comments when creating streaming tables in Databricks.

Describe alternatives you've considered

  1. Manual Column Commenting: One alternative is manually adding comments in SQL when defining the table schema. This approach, however, can be prone to inconsistencies and might result in duplicate efforts if the comments are already defined in dbt models.
  2. Post-Creation ALTER Statements: Not supported

Additional context

Here's an example of the newly implemented code:

{%- macro get_create_column_comment(model) -%}
  {% set node = model %}
  {% if node.columns %}
    {% set column_definitions = [] %}
    {% for column_name, column in node.columns.items() %}
      {% set data_type = column.data_type or 'STRING' %} 
      {% set column_line = column_name ~ ' ' ~ data_type %}
      {% if column.description %}
        {% set column_line = column_line ~ ' COMMENT ' ~ "'" ~ column.description ~ "'" %}
      {% endif %}
      {% do column_definitions.append(column_line) %}
    {% endfor %}
    {{ column_definitions | join(',\n    ') }}
  {% endif %}
{%- endmacro -%}

{% macro databricks__get_create_streaming_table_as_sql(relation, sql) -%}
  CREATE STREAMING TABLE {{ relation }} 
    ( {{ get_create_column_comment(config.model) }} )
    -- Additional clauses like partitioning and properties
    AS {{ sql }}
{% endmacro %}

The code leverages dbt metadata (node.columns) to automatically generate column comments during table creation. This ensures consistency and reduces manual overhead for managing schema documentation.

Who will this benefit?

This feature will benefit:

  • Data engineers who use Databricks and dbt for building streaming tables, allowing them to include meaningful column descriptions directly from dbt models.
  • Teams working with large data sets where maintaining schema documentation is critical. This ensures schema changes and documentation are kept in sync automatically.

Example Use Case: A team might have a dbt model that defines columns with descriptions. When they deploy this model to a Databricks Delta Live Table, the column comments would automatically be included, helping others understand the purpose and meaning of each column.

Additionally, we can also pull column constraints.

Are you interested in contributing this feature?

Yes, I'm interested in contributing to this feature. Please let me know if additional steps are required to prepare it for contribution or if any changes are needed. I am happy to discuss the implementation further.

@Amin-Siddique Amin-Siddique added the enhancement New feature or request label Sep 16, 2024
@benc-db
Copy link
Collaborator

benc-db commented Sep 16, 2024

This was initially not implemented due to issues I was having with materialized views and column comments; maybe this is no longer the case, and either way, I think it should work fine with STs. If you are interested in adding this, we would need to follow the pattern we use for other components of MV/ST so that we can detect when the dbt project changes relative to what is stored in Databricks.

@benc-db benc-db changed the title Add support for auto-generating column comments in Delta Live Tables from dbt model descriptions Add persist_docs for MV/ST Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants