Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example showing a job running an sbt-built Scala JAR #34

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions knowledge_base/sbt-example/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/.bsp/
target/
.databricks
.vscode
33 changes: 33 additions & 0 deletions knowledge_base/sbt-example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# sbt example

This example demonstrates how to build a Scala JAR with [sbt](https://www.scala-sbt.org/) and use it from a job.

## Prerequisites

* Databricks CLI v0.226.0 or above
* [sbt](https://www.scala-sbt.org/) v1.10.1 or above

## Usage

Update the `host` field under `workspace` in `databricks.yml` to the Databricks workspace you wish to deploy to.

Update the `artifact_path` field under `workspace` in `databricks.yml` to the Unity Catalog Volume path where the JAR artifact needs to be deployed.

Run `databricks bundle deploy` to deploy the job.

Run `databricks bundle run spark_jar_job` to run the job.

Example output:

```
% databricks bundle run example_job
Run URL: https://...

2024-08-09 15:49:17 "Example running a Scala JAR built with sbt" TERMINATED SUCCESS
+-----+
| word|
+-----+
|Hello|
|World|
+-----+
```
10 changes: 10 additions & 0 deletions knowledge_base/sbt-example/build.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name := "sbt-example"

version := "0.1.0-SNAPSHOT"

scalaVersion := "2.12.19"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to specify the % provided option. The deps might already be installed on a databricks cluster and this could help reduce the JAR size.

"org.apache.spark" %% "spark-sql" % "3.5.0"
)
28 changes: 28 additions & 0 deletions knowledge_base/sbt-example/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
bundle:
name: sbt_example

include:
- ./resources/job.yml

workspace:
host: https://myworkspace.cloud.databricks.com

# JARs must be stored in a Unity Catalog Volume.
# Uncomment the line below and replace the path with the path to your Unity Catalog Volume.
#
# artifact_path: /Volumes/my_catalog/my_schema/my_volume/some_path

artifacts:
sbt_example:
type: jar
build: sbt package
files:
- source: ./target/scala-2.12/sbt-example*.jar
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved

permissions:
- group_name: users
level: CAN_VIEW

targets:
dev:
default: true
1 change: 1 addition & 0 deletions knowledge_base/sbt-example/project/build.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sbt.version=1.10.1
27 changes: 27 additions & 0 deletions knowledge_base/sbt-example/resources/job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
resources:
jobs:
example_job:
name: "Example running a Scala JAR built with sbt"

tasks:
- task_key: task

spark_jar_task:
main_class_name: SparkApp

libraries:
- jar: ../target/scala-2.12/sbt-example*.jar

new_cluster:
node_type_id: i3.xlarge
spark_version: 15.4.x-scala2.12
num_workers: 0
spark_conf:
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*, 4]"
custom_tags:
"ResourceClass": "SingleNode"

# The cluster must run in single user isolation mode.
# This means it is compatible with Unity Catalog and can access Unity Catalog Volumes.
data_security_mode: SINGLE_USER
12 changes: 12 additions & 0 deletions knowledge_base/sbt-example/src/main/scala/example/SparkApp.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import org.apache.spark.sql.SparkSession

object SparkApp {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().getOrCreate()

import spark.implicits._

val data = Seq("Hello", "World").toDF("word")
data.show()
}
}