In Databricks, running code on an All Purpose Cluster is over 3x the cost of running on a Job Cluster. All Purpose Bypass provides a convenient method to quickly convert your Notebook into a job saving you time and money.
It is perfect for when you know you will have a long running command block or plan on leaving a notebook running overnight.
Databricks
- This tool is meant to be used in databricks workspaces
pip install in your Databricks Notebook
%pip install all_purpose_bypass
from all_purpose_bypass import Bypass
# Databricks API Token (Found in User Settings)
api_token = '###############################'
bypass = Bypass(api_token)
job_id = bypass.create_job()
bypass.run_job(job_id)
>>> Job located at: https://my-workspace.cloud.databricks.com/?#job/571474934623337
>>> Job Running: run_id is 1535015
By default Bypass
will create a Job named after the current notebook and assign the owner to the current user. The Job cluster that is created is a clone of the attached active all purpose cluster. To make the job more discoverable a tag of all-purpose-bypass
is assigned to every job. If the job already exists, the parameters/options are updated.
Note: You can create cluster compatibility issues. Please check with the databricks create cluster page to make sure the options are compatible with each other.
There are a number of arguments you pass to Bypass
to modify the default behavior.
Parameters:
- new_cluster: pass in your own json like dictionary with cluster configurations
- https://docs.databricks.com/dev-tools/api/latest/clusters.html#examples
{ "cluster_name": "autoscaling-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "autoscale" : { "min_workers": 2, "max_workers": 50}, "aws_attributes": { "availability": "SPOT", "zone_id": "us-west-2a"} }
- https://docs.databricks.com/dev-tools/api/latest/clusters.html#examples
- spark_version: modify the spark_version of the default current active all purpose cluster
- node_type_id: modify the node_type_id of the default current active all purpose cluster
- aws_attributes: modify the aws_attributes of the default current active all purpose cluster
- autoscale: modify the autoscale of the default current active all purpose cluster
- if this parameter is set do not use
num_workers
- if this parameter is set do not use
- num_workers: modify the num_workers of the default current active all purpose cluster
- if this parameter is set do not use
autoscale
- if this parameter is set do not use
- libraries: modify the libraries of the default current active all purpose cluster
- clusterId: change the default current active all purpose cluster to anothe existing all purpose cluster
Example:
from all_purpose_bypass import Bypass
# Databricks API Token (Found in User Settings)
api_token = '###############################'
bypass = Bypass(api_token, node_type_id="i3.4xlarge", clusterId="1095-225741-yhdswzetj")
job_id = bypass.create_job()
bypass.run_job(job_id)
>>> Job located at: https://my-workspace.cloud.databricks.com/?#job/571474934623337
>>> Job Running: run_id is 1535015
Distributed under the MIT License. See LICENSE.txt
for more information.