New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add execution context API #455

Draft

fjakobs wants to merge 1 commit into databricks:main from fabianjakobs:execution-context-api

Contributor

fjakobs commented May 4, 2022

This PR adds the still supported functionality from the REST API 1.2.

It allows users to create execution contexts and run Python, SQL, and R code on a cluster.

Note that his PR is still missing tests and can't be merged as is.

nfx requested changes

View reviewed changes

Contributor

nfx left a comment

6c429d4#diff-72d81b5e87625d187fbc145e395e1981ef31119974e19e2b34dfa3ab0a9b983bR32-R51 Command/Context API support was added along with Tunnels PoC

i'd also suggest to move .devcontainer into separate PR

fjakobs force-pushed the execution-context-api branch 2 times, most recently from c6649d5 to d3cb6fb Compare

May 4, 2022 12:02

Contributor Author

fjakobs commented May 4, 2022

I moved the dev container code out into #450 and made sure the linter is happy.

fjakobs force-pushed the execution-context-api branch 2 times, most recently from 4940954 to 1eaf336 Compare

May 4, 2022 13:17


          Add execution context and run command APIs

3cc62f6

These are the still supported functionality from the REST API 1.2.
https://docs.databricks.com/dev-tools/api/1.2/index.html

fjakobs force-pushed the execution-context-api branch from 1eaf336 to 3cc62f6 Compare

May 4, 2022 13:22

nfx requested changes

View reviewed changes

Contributor

nfx left a comment

Left detailed review on proposed changes

.devcontainer/devcontainer.json Outdated

+              			// Update 'VARIANT' to pick a Python version: 3, 3.10, 3.9, 3.8, 3.7, 3.6
+              			// Append -bullseye or -buster to pin to an OS version.
+              			// Use -bullseye variants on local on arm64/Apple Silicon.
+              			"VARIANT": "3.6-bullseye",

Contributor

nfx May 4, 2022

we should be pinning Python 3.8, i think. At least the minimum python version in out LTS runtime

Contributor Author

fjakobs May 4, 2022

tox is still using 3.6 and will fail with any other version. I think bumping the python version should be a separate task.

.devcontainer/Dockerfile Outdated

+              # See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.231.3/containers/python-3/.devcontainer/base.Dockerfile
+              # [Choice] Python version (use -bullseye variants on local arm64/Apple Silicon): 3, 3.10, 3.9, 3.8, 3.7, 3.6, 3-bullseye, 3.10-bullseye, 3.9-bullseye, 3.8-bullseye, 3.7-bullseye, 3.6-bullseye, 3-buster, 3.10-buster, 3.9-buster, 3.8-buster, 3.7-buster, 3.6-buster
+              ARG VARIANT="3.10-bullseye"

Contributor

nfx May 4, 2022

we should be pinning Python 3.8, i think. At least the minimum python version in out LTS runtime

databricks_cli/execution_context/api.py

		self.id = None


		class ExecutionContextApi(object):

Contributor

nfx May 4, 2022

Could you reuse the basic structure for command/context APIs from Tunnel CLI, so that it won't break, once merged? :)
6c429d4

it's removed, so probably it has to be re-added
0296117

databricks_cli/execution_context/cli.py

+              @profile_option
+              @eat_exceptions
+              @provide_api_client
+              def create_context_cli(api_client, cluster_id, language): #  NOQA

Contributor

nfx May 4, 2022

this command makes no sense in isolation within the CLI from end-user perspective. what makes more sense from practical perspective is "run this python code on cluster with this name and return it's results".

Contributor Author

fjakobs May 4, 2022

It does make sense when you want to run multiple commands within the same context. What you describe is possible with command-execute-once.

Contributor

nfx May 4, 2022

it's just not clear for me which end user journey it simplifies.

databricks_cli/execution_context/cli.py

+              @profile_option
+              @eat_exceptions
+              @provide_api_client
+              def get_context_status_cli(api_client, cluster_id, context_id): #  NOQA

Contributor

nfx May 4, 2022

this command makes no sense in isolation within the CLI from end-user perspective.

databricks_cli/execution_context/cli.py

+              @profile_option
+              @eat_exceptions
+              @provide_api_client
+              def cancel_command_cli(api_client, cluster_id, context_id, command_id): #  NOQA

Contributor

nfx May 4, 2022

this command makes no sense in isolation within the CLI from end-user perspective.

Contributor Author

fjakobs May 4, 2022

It does make sense when you start a command async.

Contributor

nfx May 4, 2022

execute_command_cli can fit into "workspace initialization" journey with implemented additions, as i see requests for something like that from time to time. current status quo is a run-submit a notebook job, which is also cancellable.

my concern is keeping new command surface to minimum.

databricks_cli/execution_context/cli.py

+              @profile_option
+              @eat_exceptions
+              @provide_api_client
+              def get_command_status_cli(api_client, cluster_id, context_id, command_id): #  NOQA

Contributor

nfx May 4, 2022

this command makes no sense in isolation within the CLI from end-user perspective.

Contributor Author

fjakobs May 4, 2022

It does make sense when you start a command async.

Contributor

nfx May 4, 2022

why people need to start individual notebook command async if they have run-submit jobs, that also have a snapshot of notebook being executed?

databricks_cli/execution_context/cli.py

		pass


		execution_context_group.add_command(create_context_cli, name='create')

Contributor

nfx May 4, 2022

i think that logically this belongs to a cluster command group instead

Contributor Author

fjakobs May 4, 2022

I could be convinced about that.

Contributor

nfx May 4, 2022

because commands are possible only in scope of a running interactive cluster

databricks_cli/sdk/api_client.py

                       self.api_version = api_version
                       self.jobs_api_version = jobs_api_version
+                  def get_v1_client(self):

Contributor

nfx May 4, 2022

this will make OAuth (and AAD, and OIDC) implementation fragile for long-running commands, where token might have to be refreshed couple of times to keep working. could we have another way of propagating /api/1.2 path prefix? kwarg in perform_query method, for example.

Contributor Author

fjakobs May 4, 2022

I copied this code from dbx https://github.com/databrickslabs/dbx/blob/main/dbx/utils/v1_client.py#L12
There might be a better way though.

Contributor Author

fjakobs May 4, 2022

I'll change it to use kwargs, that's indeed cleaner

databricks_cli/sdk/service.py

@@ @@ -1114,3 +1117,80 @@ def delete_repo(self, id, headers=None): @@
                       _data = {}
                       return self.client.perform_query('DELETE', '/repos/{id}'.format(id=id), data=_data, headers=headers)
+              class ExecutionContextService(object):

Contributor

nfx May 4, 2022

Could you reuse the basic structure for command/context APIs from Tunnel CLI, so that it won't break, once merged? :)
6c429d4

it's removed, so probably it has to be re-added
0296117

This was referenced May 4, 2022

Add jobs run-script command #341

Open

This is more of a question Can databricks-cli be used to run a notebook #122

Open

Contributor Author

fjakobs commented May 5, 2022

@nfx what do you think about the following changes:

Keep the execution-context group but only keep the 1:1 mapping of commands to the API there. This are the low level commands for advanced usage. e.g. When running multiple commands, potentially in different languages, in the same context. This allows us to simulate notebooks from the CLI.
Add a single execute command to the cluster group that is smart and convenient and covers the 80% used case. This could look like described in Add jobs run-script command #341 (comment)

Contributor

nfx commented May 5, 2022

@fjakobs it's just that i've literally never heard of any customer asking for interactive notebook individual commands to be exposed in CLI in any of workflows.

It's quite the opposite for SDK for command execution - yes, we need to expose a python context manager for command context in the SDK, it has plenty of real-world scenarios. Some of the applications are https://github.com/databrickslabs/jupyterlab-integration and https://github.com/databrickslabs/migrate

and of course, there were high-level workflow requests, like #122 (10 votes since 2018) / #341 (back then there were too many issues, so i followed the broken window theory and created more).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet