Skip to content

Commit

Permalink
DOC-648 flytesnacks edits needed for Neptune and W&B flytekit plugins…
Browse files Browse the repository at this point in the history
… documentation (#1748)

* update refs

Signed-off-by: nikki everett <[email protected]>

* clean up integrations information architecture

Signed-off-by: nikki everett <[email protected]>

* fix databricks agent title

Signed-off-by: nikki everett <[email protected]>

* move k8s pod plugin to deprecated integrations section and add deprecation notice

Signed-off-by: nikki everett <[email protected]>

---------

Signed-off-by: nikki everett <[email protected]>
  • Loading branch information
neverett authored Oct 14, 2024
1 parent fd33533 commit 1be1081
Show file tree
Hide file tree
Showing 6 changed files with 150 additions and 112 deletions.
10 changes: 0 additions & 10 deletions docs/integrations/deprecated_integrations/index.md

This file was deleted.

238 changes: 141 additions & 97 deletions docs/integrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,83 +8,79 @@ Flyte is designed to be highly extensible and can be customized in multiple ways
Want to contribute an example? Check out the {ref}`Documentation contribution guide <contribute_docs>`.
```

## Flytekit Plugins
## Flytekit plugins

Flytekit plugins are simple plugins that can be implemented purely in python, unit tested locally and allow extending
Flytekit functionality. These plugins can be anything and for comparison can be thought of like
[Airflow Operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html).
Flytekit plugins can be implemented purely in Python, unit tested locally, and allow extending
Flytekit functionality. For comparison, these plugins can be thought of like
[Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html).

```{list-table}
:header-rows: 0
:widths: 20 30
* - {doc}`SQL </auto_examples/sql_plugin/index>`
- Execute SQL queries as tasks.
* - {doc}`Great Expectations </auto_examples/greatexpectations_plugin/index>`
- Validate data with `great_expectations`.
* - {doc}`Papermill </auto_examples/papermill_plugin/index>`
- Execute Jupyter Notebooks with `papermill`.
* - {doc}`Pandera </auto_examples/pandera_plugin/index>`
- Validate pandas dataframes with `pandera`.
* - {doc}`Modin </auto_examples/modin_plugin/index>`
- Scale pandas workflows with `modin`.
* - {doc}`Dolt </auto_examples/dolt_plugin/index>`
- Version your SQL database with `dolt`.
* - {doc}`DBT </auto_examples/dbt_plugin/index>`
- Run and test your `dbt` pipelines in Flyte.
* - {doc}`WhyLogs </auto_examples/whylogs_plugin/index>`
- `whylogs`: the open standard for data logging.
* - {doc}`MLFlow </auto_examples/mlflow_plugin/index>`
- `mlflow`: the open standard for model tracking.
* - {doc}`ONNX </auto_examples/onnx_plugin/index>`
- Convert ML models to ONNX models seamlessly.
* - {doc}`Dolt </auto_examples/dolt_plugin/index>`
- Version your SQL database with `dolt`.
* - {doc}`DuckDB </auto_examples/duckdb_plugin/index>`
- Run analytical queries using DuckDB.
* - {doc}`Weights and Biases </auto_examples/wandb_plugin/index>`
- `wandb`: Machine learning platform to build better models faster.
* - {doc}`Great Expectations </auto_examples/greatexpectations_plugin/index>`
- Validate data with `great_expectations`.
* - {doc}`MLFlow </auto_examples/mlflow_plugin/index>`
- `mlflow`: the open standard for model tracking.
* - {doc}`Modin </auto_examples/modin_plugin/index>`
- Scale pandas workflows with `modin`.
* - {doc}`Neptune </auto_examples/neptune_plugin/index>`
- `neptune`: Neptune is the MLOps stack component for experiment tracking.
* - {doc}`NIM </auto_examples/nim_plugin/index>`
- Serve optimized model containers with NIM.
* - {doc}`Ollama </auto_examples/ollama_plugin/index>`
- Serve fine-tuned LLMs with Ollama in a Flyte workflow.
* - {doc}`ONNX </auto_examples/onnx_plugin/index>`
- Convert ML models to ONNX models seamlessly.
* - {doc}`Pandera </auto_examples/pandera_plugin/index>`
- Validate pandas dataframes with `pandera`.
* - {doc}`Papermill </auto_examples/papermill_plugin/index>`
- Execute Jupyter Notebooks with `papermill`.
* - {doc}`SQL </auto_examples/sql_plugin/index>`
- Execute SQL queries as tasks.
* - {doc}`Weights and Biases </auto_examples/wandb_plugin/index>`
- `wandb`: Machine learning platform to build better models faster.
* - {doc}`WhyLogs </auto_examples/whylogs_plugin/index>`
- `whylogs`: the open standard for data logging.
```

:::{dropdown} {fa}`info-circle` Using flytekit plugins
:::{dropdown} {fa}`info-circle` Using Flytekit plugins
:animate: fade-in-slide-down

Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the
{py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit.
Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the {py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit.

Flytekit Plugins are lazily loaded and can be released independently like libraries. We follow a convention to name the
plugin like `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example
`flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/).
Flytekit plugins are lazily loaded and can be released independently like libraries. The naming convention is `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example, `flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/).

You can find the plugins maintained by the core Flyte team [here](https://github.com/flyteorg/flytekit/tree/master/plugins).
:::

## Native Backend Plugins
## Native backend plugins

Native Backend Plugins are the plugins that can be executed without any external service dependencies because the compute is
orchestrated by Flyte itself, within its provisioned Kubernetes clusters.
Native backend plugins can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned Kubernetes clusters.

```{list-table}
:header-rows: 0
:widths: 20 30
* - {doc}`K8s Pods </auto_examples/k8s_pod_plugin/index>`
- Execute K8s pods for arbitrary workloads.
* - {doc}`K8s Cluster Dask Jobs </auto_examples/k8s_dask_plugin/index>`
- Run Dask jobs on a K8s Cluster.
* - {doc}`K8s Cluster Spark Jobs </auto_examples/k8s_spark_plugin/index>`
- Run Spark jobs on a K8s Cluster.
* - {doc}`Kubeflow PyTorch </auto_examples/kfpytorch_plugin/index>`
- Run distributed PyTorch training jobs using `Kubeflow`.
* - {doc}`Kubeflow TensorFlow </auto_examples/kftensorflow_plugin/index>`
- Run distributed TensorFlow training jobs using `Kubeflow`.
* - {doc}`Kubernetes pods </auto_examples/k8s_pod_plugin/index>`
- Execute Kubernetes pods for arbitrary workloads.
* - {doc}`Kubernetes cluster Dask jobs </auto_examples/k8s_dask_plugin/index>`
- Run Dask jobs on a Kubernetes Cluster.
* - {doc}`Kubernetes cluster Spark jobs </auto_examples/k8s_spark_plugin/index>`
- Run Spark jobs on a Kubernetes Cluster.
* - {doc}`MPI Operator </auto_examples/kfmpi_plugin/index>`
- Run distributed deep learning training jobs using Horovod and MPI.
* - {doc}`Ray Task </auto_examples/ray_plugin/index>`
* - {doc}`Ray </auto_examples/ray_plugin/index>`
- Run Ray jobs on a K8s Cluster.
```

Expand All @@ -98,54 +94,53 @@ orchestrated by Flyte itself, within its provisioned Kubernetes clusters.
:header-rows: 0
:widths: 20 30
* - {doc}`AWS SageMaker Inference agent </auto_examples/sagemaker_inference_agent/index>`
- Deploy models and create, as well as trigger inference endpoints on AWS SageMaker.
* - {doc}`Airflow agent </auto_examples/airflow_agent/index>`
- Run Airflow jobs in your workflows with the Airflow agent.
* - {doc}`BigQuery agent </auto_examples/bigquery_agent/index>`
- Run BigQuery jobs in your workflows with the BigQuery agent.
* - {doc}`ChatGPT agent </auto_examples/chatgpt_agent/index>`
- Run ChatGPT jobs in your workflows with the ChatGPT agent.
* - {doc}`Databricks </auto_examples/databricks_agent/index>`
* - {doc}`Databricks agent </auto_examples/databricks_agent/index>`
- Run Databricks jobs in your workflows with the Databricks agent.
* - {doc}`Memory Machine Cloud </auto_examples/mmcloud_agent/index>`
* - {doc}`Memory Machine Cloud agent </auto_examples/mmcloud_agent/index>`
- Execute tasks using the MemVerge Memory Machine Cloud agent.
* - {doc}`OpenAI Batch </auto_examples/openai_batch_agent/index>`
- Submit requests for asynchronous batch processing on OpenAI.
* - {doc}`SageMaker Inference </auto_examples/sagemaker_inference_agent/index>`
- Deploy models and create, as well as trigger inference endpoints on SageMaker.
* - {doc}`Sensor </auto_examples/sensor/index>`
* - {doc}`Sensor agent </auto_examples/sensor/index>`
- Run sensor jobs in your workflows with the sensor agent.
* - {doc}`Snowflake </auto_examples/snowflake_agent/index>`
* - {doc}`Snowflake agent </auto_examples/snowflake_agent/index>`
- Run Snowflake jobs in your workflows with the Snowflake agent.
```

(external_service_backend_plugins)=

## External Service Backend Plugins
## External service backend plugins

As the term suggests, external service backend plugins rely on external services like
[Hive](https://docs.qubole.com/en/latest/user-guide/engines/hive/index.html) for handling the workload defined in the Flyte task that uses the respective plugin.
As the term suggests, these plugins rely on external services to handle the workload defined in the Flyte task that uses the plugin.

```{list-table}
:header-rows: 0
:widths: 20 30
* - {doc}`AWS Athena plugin </auto_examples/athena_plugin/index>`
* - {doc}`AWS Athena </auto_examples/athena_plugin/index>`
- Execute queries using AWS Athena
* - {doc}`AWS Batch plugin </auto_examples/aws_batch_plugin/index>`
* - {doc}`AWS Batch </auto_examples/aws_batch_plugin/index>`
- Running tasks and workflows on AWS batch service
* - {doc}`Flyte Interactive </auto_examples/flyteinteractive_plugin/index>`
- Execute tasks using Flyte Interactive to debug.
* - {doc}`Hive plugin </auto_examples/hive_plugin/index>`
* - {doc}`Hive </auto_examples/hive_plugin/index>`
- Run Hive jobs in your workflows.
```

(enable-backend-plugins)=

::::{dropdown} {fa}`info-circle` Enabling Backend Plugins
::::{dropdown} {fa}`info-circle` Enabling backend plugins
:animate: fade-in-slide-down

To enable a backend plugin you have to add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration.
The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/[email protected]/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows,
To enable a backend plugin, you must add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration.
The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/[email protected]/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows:

```yaml
tasks:
Expand All @@ -160,15 +155,15 @@ tasks:
container_array: k8s-array
```
**Finding the `ID` of the Backend Plugin**
**Finding the `ID` of the backend plugin**

This is a little tricky since you have to look at the source code of the plugin to figure out the `ID`. In the case of Spark, for example, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424) here, defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41).
To find the `ID` of the backend plugin, look at the source code of the plugin. For examples, in the case of Spark, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424), defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41).

::::

## SDKs for Writing Tasks and Workflows
## SDKs for writing tasks and workflows

The {ref}`community <community>` would love to help you with your own ideas of building a new SDK. Currently the available SDKs are:
The {ref}`community <community>` would love to help you build new SDKs. Currently, the available SDKs are:

```{list-table}
:header-rows: 0
Expand All @@ -180,7 +175,7 @@ The {ref}`community <community>` would love to help you with your own ideas of b
- The Java/Scala SDK for Flyte.
```

## Flyte Operators
## Flyte operators

Flyte can be integrated with other orchestrators to help you leverage Flyte's
constructs natively within other orchestration tools.
Expand All @@ -196,42 +191,91 @@ constructs natively within other orchestration tools.
```{toctree}
:maxdepth: -1
:hidden:
:caption: Flytekit plugins

DBT </auto_examples/dbt_plugin/index>
Dolt </auto_examples/dolt_plugin/index>
DuckDB </auto_examples/duckdb_plugin/index>
Great Expectations </auto_examples/greatexpectations_plugin/index>
MLFlow </auto_examples/mlflow_plugin/index>
Modin </auto_examples/modin_plugin/index>
Neptune </auto_examples/neptune_plugin/index>
NIM </auto_examples/nim_plugin/index>
Ollama </auto_examples/ollama_plugin/index>
ONNX </auto_examples/onnx_plugin/index>
Pandera </auto_examples/pandera_plugin/index>
Papermill </auto_examples/papermill_plugin/index>
SQL </auto_examples/sql_plugin/index>
Weights & Biases </auto_examples/wandb_plugin/index>
WhyLogs </auto_examples/whylogs_plugin/index>
```

```{toctree}
:maxdepth: -1
:hidden:
:caption: Native backend plugins
Kubeflow PyTorch </auto_examples/kfpytorch_plugin/index>
Kubeflow TensorFlow </auto_examples/kftensorflow_plugin/index>
Kubernetes cluster Dask jobs </auto_examples/k8s_dask_plugin/index>
Kubernetes cluster Spark jobs </auto_examples/k8s_spark_plugin/index>
MPI Operator </auto_examples/kfmpi_plugin/index>
Ray </auto_examples/ray_plugin/index>
```

```{toctree}
:maxdepth: -1
:hidden:
:caption: Flyte agents
Airflow agent </auto_examples/airflow_agent/index>
AWS Sagemaker inference agent </auto_examples/sagemaker_inference_agent/index>
BigQuery agent </auto_examples/bigquery_agent/index>
ChatGPT agent </auto_examples/chatgpt_agent/index>
Databricks agent </auto_examples/databricks_agent/index>
Memory Machine Cloud agent </auto_examples/mmcloud_agent/index>
OpenAI batch agent </auto_examples/openai_batch_agent/index>
Sensor agent </auto_examples/sensor/index>
Snowflake agent </auto_examples/snowflake_agent/index>
```

```{toctree}
:maxdepth: -1
:hidden:
:caption: External service backend plugins
AWS Athena </auto_examples/athena_plugin/index>
AWS Batch </auto_examples/aws_batch_plugin/index>
Flyte Interactive </auto_examples/flyteinteractive_plugin/index>
Hive </auto_examples/hive_plugin/index>
```

```{toctree}
:maxdepth: -1
:hidden:
:caption: SDKs for writing tasks and workflows
flytekit <https://flytekit.readthedocs.io/>
flytekit-java <https://github.com/spotify/flytekit-java>
```

```{toctree}
:maxdepth: -1
:hidden:
:caption: Flyte operators
Airflow </auto_examples/airflow_plugin/index>
```

```{toctree}
:maxdepth: -1
:hidden:
:caption: Deprecated integrations
/auto_examples/airflow_agent/index
/auto_examples/airflow_plugin/index
/auto_examples/athena_plugin/index
/auto_examples/aws_batch_plugin/index
/auto_examples/bigquery_agent/index
/auto_examples/chatgpt_agent/index
/auto_examples/k8s_dask_plugin/index
/auto_examples/databricks_agent/index
/auto_examples/dbt_plugin/index
/auto_examples/dolt_plugin/index
/auto_examples/duckdb_plugin/index
/auto_examples/flyteinteractive_plugin/index
/auto_examples/greatexpectations_plugin/index
/auto_examples/hive_plugin/index
/auto_examples/k8s_pod_plugin/index
/auto_examples/mlflow_plugin/index
/auto_examples/mmcloud_agent/index
/auto_examples/modin_plugin/index
/auto_examples/kfmpi_plugin/index
/auto_examples/neptune_plugin/index
/auto_examples/nim_plugin/index
/auto_examples/ollama_plugin/index
/auto_examples/onnx_plugin/index
/auto_examples/openai_batch_agent/index
/auto_examples/papermill_plugin/index
/auto_examples/pandera_plugin/index
/auto_examples/kfpytorch_plugin/index
/auto_examples/ray_plugin/index
/auto_examples/sagemaker_inference_agent/index
/auto_examples/sensor/index
/auto_examples/snowflake_agent/index
/auto_examples/k8s_spark_plugin/index
/auto_examples/sql_plugin/index
/auto_examples/kftensorflow_plugin/index
/auto_examples/wandb_plugin/index
/auto_examples/whylogs_plugin/index
Deprecated integrations <deprecated_integrations/index>
BigQuery plugin </auto_examples/bigquery_plugin/index>
Databricks plugin </auto_examples/databricks_plugin/index>
Kubernetes pods </auto_examples/k8s_pod_plugin/index>
Snowflake plugin </auto_examples/snowflake_plugin/index>
```
2 changes: 1 addition & 1 deletion examples/databricks_agent/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
(databricks_agent)=

# Databricks agent example
# Databricks agent

```{eval-rst}
.. tags:: Spark, Integration, DistributedComputing, Data, Advanced
Expand Down
4 changes: 4 additions & 0 deletions examples/k8s_pod_plugin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
.. tags:: Integration, Kubernetes, Advanced
```

```{important}
This plugin is no longer needed and is here only for backwards compatibility. No new versions will be published after v1.13.x Please use the `pod_template` and `pod_template_name` arguments to `@task` as described in the {ref}`Kubernetes task pod configuration guide <deployment-configuration-general>` instead.
```

Flyte tasks, represented by the {py:func}`@task <flytekit.task>` decorator, are essentially single functions that run in one container.
However, there may be situations where you need to run a job with more than one container or require additional capabilities, such as:

Expand Down
6 changes: 3 additions & 3 deletions examples/neptune_plugin/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
(neptune)=
(neptune_plugin)=

# Neptune
# Neptune plugin

```{eval-rst}
.. tags:: Integration, Data, Metrics, Intermediate
Expand All @@ -10,7 +10,7 @@

## Installation

To install the Flyte Neptune plugin, , run the following command:
To install the Flyte Neptune plugin, run the following command:

```bash
pip install flytekitplugins-neptune
Expand Down
Loading

0 comments on commit 1be1081

Please sign in to comment.