Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-852] [Feature] Data type conversions on schema change when unsupported by alter statement in Snowflake #755

Open
2 tasks done
kbrock91 opened this issue Aug 28, 2023 · 5 comments
Labels

Comments

@kbrock91
Copy link

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

in snowflake, when a model is set to an incremental materialization, on_schema_change = 'sync_all_columns' , and the detected change is a column data type change (e.g. FLOAT to NUMBER), dbt recognizes the schema change, executes an alter statement to try to change the data type. for Snowflake we're attempting to just use alter table ... alter set data type, which works for some type changes but not for others. For unsupported type conversions links out to what is supported/unsupported)(e.g. FLOAT to NUMBER) snowflake fails on the alter statement and the incremental model is unsuccessful. Today, we have to either manually recreate the column with the right data type or run a full refresh on our incremental to be able to load the data.

Expected Behavior

When dbt detects a schema change with an unsupported data type conversion, it should change the behavior to account for this limit. We could either have a flag to run a full-refresh (not ideal) or more creatively, dbt could execute a series of commands to fully recreate the column with the new type, copy the values over, delete the old column.

Steps To Reproduce

  1. In dbt Cloud IDE with Snowflake adapter, create incremental with on_schema_change = sync all columns.
{{
    config(
        materialized='incremental',
        unique_key='id',
        on_schema_change = 'sync_all_columns'
    )
}}

with data as 

( 

with source_data as 

(
    select
        1 as id,        
        10.0::FLOAT as my_test_column
    union 

    select
        2 as id,
        5.0::FLOAT as my_test_column

)


select * from source_data


    {% if is_incremental() %}

        where id >= (select max(id) from {{ this }})


    {% endif %}

)

select * from data
  1. Update incremental logic to force a column data type changes to unsupported type conversion.
{{
   config(
       materialized='incremental',
       unique_key='id',
       on_schema_change = 'sync_all_columns'
   )
}}

with data as 

( 

with source_data as 

(
   select
       1 as id,        
       to_number(10.0) as my_test_column

   union 

   select
       2 as id,
       to_number(5.0) as my_test_column

)


select * from source_data


   {% if is_incremental() %}

       where id >= (select max(id) from {{ this }})


   {% endif %}

)

select * from data
  1. Receive error
16:40:50 Database Error in model my_test_incremental_dbt_model (models/example/my_test_incremental_dbt_model.sql)
  002108 (22000): SQL compilation error: cannot change column MY_TEST_COLUMN from type FLOAT to NUMBER(38,0)

Relevant log output

No response

Environment

- OS:
- Python:
- dbt-core: 1.6.0
- dbt-snowflake: 1.6.0

on dbt Cloud

Additional Context

No response

@kbrock91 kbrock91 added type:bug Something isn't working triage:product labels Aug 28, 2023
@github-actions github-actions bot changed the title [Issue] Data type conversions on schema change when unsupported by alter statement in Snowflake [ADAP-852] [Issue] Data type conversions on schema change when unsupported by alter statement in Snowflake Aug 28, 2023
@dbeatty10 dbeatty10 self-assigned this Aug 28, 2023
@dbeatty10
Copy link
Contributor

Thanks for raising this @kbrock91 ! 🧠

Crux

It seems like the crux of the issue is that some data types (plus their precision/scale/length) can't always be converted to others without some kind of loss.

e.g. suppose we have a VARCHAR(4) column that contains the value blue. We can't convert it to a NUMBER or VARCHAR(2) without some data loss.

So we'd need someone (or something) to make a decision of how to do those types of conversions.

Proposed ideas

We could either have a flag to run a full-refresh (not ideal) or more creatively, dbt could execute a series of commands to fully recreate the column with the new type, copy the values over, delete the old column.

You proposed a couple ideas of how to proceed:

We could either

  1. have a flag to run a full-refresh (not ideal)

  2. or more creatively, dbt could execute a series of commands to fully recreate the column with the new type, copy the values over, delete the old column.

Idea 2

I don't think dbt could safely do idea 2 without possible data loss due to the main crux of the issue listed above. An off-the-wall idea would be to utilize Snowflake's try_cast / dbt's safe_cast, but it would result in NULL values anytime the cast didn't succeed.

Idea 1

It sounds like you are suggesting a new flag / config like --full-refresh-on-incremental-fail (but of course not that wordy)?

One way a user could do something similar would be to re-run the failed model with the --full-refresh flag. But this has the downside that they'd need to do this manually rather than having a flag/config they could "set and forget".

Of course always using the --full-refresh for all models would achieve the net effect you are after, but would defeat the purpose of incremental models 😅.

Next steps

Overall, there is a solid reason that Snowflake doesn't allow alter statements in these types of scenarios. It seems pretty reasonable for dbt to be hands-off in cases like this and handing it over to a human to put their eyes on it.

I'd also expect these cases to be relatively infrequent 🤞

Do you have any other ideas how to empower dbt users when there is an incompatible change of data types? Maybe we could output an error message with more detail on potential steps to resolve the issue?

Would adding contracts to upstream relations alleviate the risk of surprising changes in data types?

This seems like it is generalizable beyond just dbt-snowflake, so we might end up transferring this issue to dbt-core at some point.

Either way, dbt-snowflake is behaving the way we expect currently, so I'm going to re-label this as a feature request.

@dbeatty10 dbeatty10 added type:enhancement New feature or request awaiting_response and removed type:bug Something isn't working triage:product labels Aug 28, 2023
@dbeatty10 dbeatty10 removed their assignment Aug 28, 2023
@dbeatty10 dbeatty10 changed the title [ADAP-852] [Issue] Data type conversions on schema change when unsupported by alter statement in Snowflake [ADAP-852] [Feature] Data type conversions on schema change when unsupported by alter statement in Snowflake Aug 28, 2023
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 27, 2023
Copy link
Contributor

github-actions bot commented Dec 4, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

1 similar comment
Copy link
Contributor

github-actions bot commented Dec 4, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2023
@Tonayya Tonayya reopened this Sep 20, 2024
@Tonayya
Copy link

Tonayya commented Sep 20, 2024

Hi team, just re-opening this to ask a few questions on behalf of a user regarding this issue:

Issue Description: We recently introduced materialized='incremental' and on_schema_change='sync_all_columns' in all our dbt models to manage schema drifts in the base models. However, during the build process, two models encountered errors:
Error: SQL compilation error: cannot change column COLUMN_NAME from type VARCHAR(1) to NUMBER(1,0).
Error: SQL compilation error: cannot change column COLUMN_NAME from type NUMBER(38,0) to NUMBER(38,2) because changing the scale of a number is not supported.
From the debug logs, it appears that dbt is attempting to execute an ALTER query to change the data types, which is not supported by Snowflake.
We would like to understand the following:

  1. Why is dbt attempting to alter columns in a way that is technically not supported by Snowflake?
  2. Is there a roadmap item addressing this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants