shell scripts to deploy a Scala project to a Databricks cluster.
- Docker must be installed (
- You must be logged in to a Docker repo (docker login)
- Python must be installed (
- Databricks cli must be installed (pip install databricks-cli)
- Databricks cli must be configured (databricks configure)
- If you are using STS assume roles to access AWS S3 buckets from Databricks, you need the ARN of instance profile role and the assume role (1 time per profile)
- Make sure to update the build.sbt with below tasks
val copyJarsTask = taskKey[Unit]("copy-jars")
copyJarsTask := {
val folder = new File("lib/jars")
//Find the relevant Jars
val requiredLib= libraryDependencies.value.filter(v=>(!v.toString().contains("test"))
{val arr=v.toString().split(":")
(managedClasspath in Compile) { f =>>{
IO.copyFile(f, folder / f.getName)
val deleteJarsTask = taskKey[Unit]("delete-jars")
deleteJarsTask := {
val folder = new File("lib/jars")
- All your settings can be put inside a config.yaml file like below(Applicable only for shell script).
ORGANIZATION : <orgnaziation docker hub name>
REPOSITORY_NAME : <repositort name>
VERSION : <version of repository>
clustername: "<enter cluster name>"
databricks_profile: "<enter profile>"
roles: {
instance_profile_arn: "<enter role>"
assume_role_arn: "<enter role>"
zone_id: "<enter region>"
min_workers: "<min workers>"
max_workers: "<max workers>"
spark_version: "<spark version in cluster>"
node_type_id : "<node type>"
autotermination_minutes: "<time out>"
driver_node_type_id : "<driver node type>"
docker_image_url : "<docker image path>"
- Go to the root folder of your project.
- Run and follow instructions
- It will create docker image with all required dependencies and create the databricks cluster for you