Flyte can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.
To install the plugin, run the following command:
pip install flytekitplugins-spark
To configure Spark in the Flyte deployment's backend, follow Step 1, 2.
All examples showcasing execution of Spark jobs using the plugin can be found in the documentation.