-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add execution context API #455
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6c429d4#diff-72d81b5e87625d187fbc145e395e1981ef31119974e19e2b34dfa3ab0a9b983bR32-R51 Command/Context API support was added along with Tunnels PoC
i'd also suggest to move .devcontainer into separate PR
c6649d5
to
d3cb6fb
Compare
I moved the dev container code out into #450 and made sure the linter is happy. |
4940954
to
1eaf336
Compare
These are the still supported functionality from the REST API 1.2. https://docs.databricks.com/dev-tools/api/1.2/index.html
1eaf336
to
3cc62f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left detailed review on proposed changes
.devcontainer/devcontainer.json
Outdated
// Update 'VARIANT' to pick a Python version: 3, 3.10, 3.9, 3.8, 3.7, 3.6 | ||
// Append -bullseye or -buster to pin to an OS version. | ||
// Use -bullseye variants on local on arm64/Apple Silicon. | ||
"VARIANT": "3.6-bullseye", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be pinning Python 3.8, i think. At least the minimum python version in out LTS runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tox
is still using 3.6 and will fail with any other version. I think bumping the python version should be a separate task.
.devcontainer/Dockerfile
Outdated
# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.231.3/containers/python-3/.devcontainer/base.Dockerfile | ||
|
||
# [Choice] Python version (use -bullseye variants on local arm64/Apple Silicon): 3, 3.10, 3.9, 3.8, 3.7, 3.6, 3-bullseye, 3.10-bullseye, 3.9-bullseye, 3.8-bullseye, 3.7-bullseye, 3.6-bullseye, 3-buster, 3.10-buster, 3.9-buster, 3.8-buster, 3.7-buster, 3.6-buster | ||
ARG VARIANT="3.10-bullseye" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be pinning Python 3.8, i think. At least the minimum python version in out LTS runtime
self.id = None | ||
|
||
|
||
class ExecutionContextApi(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@profile_option | ||
@eat_exceptions | ||
@provide_api_client | ||
def create_context_cli(api_client, cluster_id, language): # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this command makes no sense in isolation within the CLI from end-user perspective. what makes more sense from practical perspective is "run this python code on cluster with this name and return it's results".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense when you want to run multiple commands within the same context. What you describe is possible with command-execute-once
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just not clear for me which end user journey it simplifies.
@profile_option | ||
@eat_exceptions | ||
@provide_api_client | ||
def get_context_status_cli(api_client, cluster_id, context_id): # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this command makes no sense in isolation within the CLI from end-user perspective.
@profile_option | ||
@eat_exceptions | ||
@provide_api_client | ||
def cancel_command_cli(api_client, cluster_id, context_id, command_id): # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this command makes no sense in isolation within the CLI from end-user perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense when you start a command async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execute_command_cli
can fit into "workspace initialization" journey with implemented additions, as i see requests for something like that from time to time. current status quo is a run-submit a notebook job, which is also cancellable.
my concern is keeping new command surface to minimum.
@profile_option | ||
@eat_exceptions | ||
@provide_api_client | ||
def get_command_status_cli(api_client, cluster_id, context_id, command_id): # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this command makes no sense in isolation within the CLI from end-user perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense when you start a command async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why people need to start individual notebook command async if they have run-submit jobs, that also have a snapshot of notebook being executed?
pass | ||
|
||
|
||
execution_context_group.add_command(create_context_cli, name='create') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that logically this belongs to a cluster command group instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be convinced about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because commands are possible only in scope of a running interactive cluster
@@ -105,6 +105,12 @@ def __init__(self, user=None, password=None, host=None, token=None, | |||
self.api_version = api_version | |||
self.jobs_api_version = jobs_api_version | |||
|
|||
def get_v1_client(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will make OAuth (and AAD, and OIDC) implementation fragile for long-running commands, where token might have to be refreshed couple of times to keep working. could we have another way of propagating /api/1.2
path prefix? kwarg in perform_query
method, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this code from dbx https://github.com/databrickslabs/dbx/blob/main/dbx/utils/v1_client.py#L12
There might be a better way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to use kwargs, that's indeed cleaner
@@ -1114,3 +1117,80 @@ def delete_repo(self, id, headers=None): | |||
_data = {} | |||
|
|||
return self.client.perform_query('DELETE', '/repos/{id}'.format(id=id), data=_data, headers=headers) | |||
|
|||
class ExecutionContextService(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfx what do you think about the following changes:
|
@fjakobs it's just that i've literally never heard of any customer asking for interactive notebook individual commands to be exposed in CLI in any of workflows. It's quite the opposite for SDK for command execution - yes, we need to expose a python context manager for command context in the SDK, it has plenty of real-world scenarios. Some of the applications are https://github.com/databrickslabs/jupyterlab-integration and https://github.com/databrickslabs/migrate and of course, there were high-level workflow requests, like #122 (10 votes since 2018) / #341 (back then there were too many issues, so i followed the broken window theory and created more). |
This PR adds the still supported functionality from the REST API 1.2.
It allows users to create execution contexts and run Python, SQL, and R code on a cluster.
Note that his PR is still missing tests and can't be merged as is.