Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3112] [Feature] Pass model configuration to execute() #226

Closed
3 tasks done
SoumayaMauthoorMOJ opened this issue Sep 12, 2023 · 7 comments
Closed
3 tasks done
Labels
enhancement New feature or request Stale

Comments

@SoumayaMauthoorMOJ
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Have the ability to pass in model, snapshot, seed and other configuration to the cursor() method, in addition to credentials saved to profiles.yml.

Describe alternatives you've considered

No response

Who will this benefit?

This has been raised by at least two adapters:

  1. dbt-athena: Feature: set work_group in dbt_project.yml dbt-athena#377
  2. dbt-trino: draft: enhancement-322-Add-retry-mechanism-for-the-given-error starburstdata/dbt-trino#323 (comment)

Are you interested in contributing this feature?

No response

Anything else?

No response

@SoumayaMauthoorMOJ SoumayaMauthoorMOJ added enhancement New feature or request triage labels Sep 12, 2023
@github-actions github-actions bot changed the title [Feature] Pass model configuration to execute() in dbt adapters [CT-3112] [Feature] Pass model configuration to execute() in dbt adapters Sep 12, 2023
@SoumayaMauthoorMOJ SoumayaMauthoorMOJ changed the title [CT-3112] [Feature] Pass model configuration to execute() in dbt adapters [CT-3112] [Feature] Pass model configuration to execute() Sep 12, 2023
@jtcohen6
Copy link
Contributor

Thanks for opening @SoumayaMauthoorMOJ!

We've gotten this ask before, in order to support things like:

I don't think this is something we'd be able to prioritize immediately, but I want to leave some notes below, in case it's helpful in thinking through what a better pattern here might look like.

There are a few things that make this tricky. I think it's slightly trickier if the relevant config needs to be supplied when creating the connection, versus configs that can be supplied as additional keyword arguments to an existing connection/cursor. Support for the latter pattern seems to vary quite a bit, and those keyword arguments are not standardized.

If the config is relevant to connection creation:

  • We sorta do this today, in order to support query comments/headers containing node-specific info, but that code is quite gnarly.
  • On certain data warehouses, it's possible to reuse connections. We'd either need dbt to be smarter about the config with which it opened a given connection (more state management), or simply say that these capabilities are mutually exclusive.

If the config can be passed to the execute method of an existing connection/cursor:

  • In your example linked above, it seems like like this would be the case for work_group in dbt-athena
  • Adapter-specific implementations of execute would then need to accept arbitrary additional keyword arguments (as some already do). Then, dbt-core would need to fetch and pass in those arguments any time it calls adapter.execute — such as in the Jinja context here, while calling a statement. I could imagine some sort of new macro / entry-point for adapter-specific "extensions" — pull specific kwargs out of config, and pass them into adapter.execute — and I think it would be possible to prototype this without a prerequisite change to dbt-core.

@jtcohen6 jtcohen6 removed the triage label Sep 19, 2023
@SoumayaMauthoorMOJ
Copy link
Author

Hello @jtcohen6 thanks for getting back to me. To clarify the requirement is about "configs that can be supplied as additional keyword arguments to the execute method of an existing connection/cursor." This is because we can already pass the config during connection creation by adding to the profiles.yml.

I should also clarify that it is somewhat possible to pass in configs by creating additional functions in impl.py see run_query_with_partitions_limit_catching() as an example:
https://github.com/dbt-athena/dbt-athena/blob/6ef6c970134abbbf52afb58110545825fde3a36d/dbt/adapters/athena/impl.py#L988

However this requires modifying all macros to explicitly refer to this function, which means a lot of additional code to modify/maintain.

@jtcohen6
Copy link
Contributor

To clarify the requirement is about "configs that can be supplied as additional keyword arguments to the execute method of an existing connection/cursor." This is because we can already pass the config during connection creation by adding to the profiles.yml.

Okay. My sense is that, for some data warehouse clients, there are configs which can only be set at initial connection creation, and cannot be subsequently overridden when submitting/executing a specific query. To support those warehouses/configs, we'd need the trickier pattern of (a) fetching that config when the connection is being created; (b) not reusing connections, but always closing & reopening.

I understand that this is different from the config of interest that prompted you to open this issue.

However this requires modifying all macros to explicitly refer to this function, which means a lot of additional code to modify/maintain.

In theory, something like this might be possible:

{%- macro statement(name=None, fetch_result=False, auto_begin=True, language='sql') -%}
    ...
    {%- if language == 'sql'-%}
      {%- set work_group = config.get("work_group") -%}
      {%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result, work_group=work_group) -%}
    ...

I haven't thought enough about whether that's the right thread to pull on — but I think it would have coverage for almost all queries submitted from the Jinja context, which tend to pass through run_query or statement. If we were to do something like that, we would want an additional adapter macro (?) to return the set of relevant kwargs dynamically. Something like:

      {%- set adapter_specific_kwargs = get_adapter_execute_kwargs_from_context() -%}
      {%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result, **kwargs) -%}
      -- or
      {%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result, adapter_kwargs=adapter_specific_kwargs) -%}      

@SoumayaMauthoorMOJ
Copy link
Author

(b) not reusing connections, but always closing & reopening.
I understand that this is different from the config of interest that prompted you to open this issue.

So just to clarify, you understand that in the case of Athena passing in additional configs such as work_group does not require closing and reopening the connection since I can pass that to execute():

from pyathena import connect

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2").cursor()
cursor.execute("SELECT * FROM one_row",
               work_group="YOUR_WORK_GROUP")

Why not limit to configs that don't need you to close and reopen the connection? Closing and reopening the connection seems a bit drastic and could impact performance.

@jtcohen6
Copy link
Contributor

Transferring to dbt-adapters as the relevant place for ongoing discussion. This isn't a near-term priority for us, but I can appreciate the benefits in adapter-specific flexibility by either

  • exposing the node config context to adapter.execute, similar to what we already support in query comments
  • allowing each adapter to define a specific set of **kwargs that are plumbed through

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core May 23, 2024
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 20, 2024
Copy link

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants