This guide provides a Helm chart to deploy Livy on Kubernetes without relying on cloud services like AWS, GCP, or Azure. This setup can save development time and cost, and it allows debugging using an IDE. For debugging Livy on Kubernetes as a standalone setup, Apache Spark and Apache Livy must be deployed in Kubernetes.
- Install Docker Desktop.
- Enable Kubernetes in Docker Desktop settings.
-
Install Helm.
-
Add the required Helm chart repositories:
helm repo add cert-manager https://charts.jetstack.io helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update
-
Add an entry to the
/etc/hosts
file:127.0.0.1 my-cluster.example.com
-
Install the cert-manager CustomResourceDefinition resources:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.0/cert-manager.yaml
References:
-
Build the Helm chart using the following command:
helm dependency build
-
Create a Kubernetes namespace for the Livy deployment:
kubectl create namespace <namespace-name>
-
Install the Livy cluster using the Helm chart:
helm -n <namespace-name> install livycluster .
-
Create an interactive session:
curl -k -X POST -H "Content-Type: application/json" --data '{"kind": "spark"}' https://my-cluster.example.com/livy/sessions | jq
Note: You need
curl
andjq
utilities installed on your local machine for testing. -
Create a statement:
curl -k -X POST -d '{ "kind": "spark", "code": "sc.parallelize(1 to 10).count()" }' -H "Content-Type: application/json" \ https://my-cluster.example.com/livy/sessions/0/statements | jq
-
Create a batch job:
curl -s -k -H "Content-Type: application/json" \ -X POST \ -d '{ "name": "testbatch1", "className": "org.apache.spark.examples.SparkPi", "numExecutors": 2, "file": "local:///opt/spark/examples/jars/spark-examples_2.12-3.2.3.jar", "args": ["10000"] }' "https://my-cluster.example.com/livy/batches" | jq
Steps to create Docker images for Spark and Livy are documented at Docker.md.