-
Notifications
You must be signed in to change notification settings - Fork 103
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[INLONG-1085][Doc] Add documents for Airflow offline scheduler (#1089)
- Loading branch information
Showing
72 changed files
with
318 additions
and
47 deletions.
There are no files selected for viewing
125 changes: 125 additions & 0 deletions
125
docs/quick_start/offline_data_sync/airflow_pulsar_mysql_example.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
--- | ||
title: Example of Airflow Offline Synchronization | ||
sidebar_position: 3 | ||
--- | ||
In the following content, a complete example will be used to introduce how to create Airflow scheduling tasks using Apache InLong and complete offline data synchronization from Pulsar to MySQL. | ||
|
||
## Deployment | ||
### Install InLong | ||
|
||
Before we begin, we need to install InLong. Here we provide two ways: | ||
- [Docker Deployment](deployment/docker.md) (Recommended) | ||
- [Bare Metal Deployment](deployment/bare_metal.md) | ||
|
||
### Add Connectors | ||
|
||
Download the [connectors](https://inlong.apache.org/downloads/) corresponding to Flink version, and after decompression, place `sort-connector-jdbc-[version]-SNAPSHOT.jar` in `/inlong-sort/connectors/` directory. | ||
> Currently, Apache InLong's offline data synchronization capability only supports Flink-1.18, so please download the 1.18 version of connectors. | ||
## Create Clusters And Data Target | ||
|
||
### Create Cluster Label | ||
![airflow_create_cluster_labels](img/pulsar_mysql/airflow/airflow_create_cluster_labels.png) | ||
|
||
### Register Pulsar Cluster | ||
|
||
![airflow_create_pulsar_cluster](img/pulsar_mysql/airflow/airflow_create_pulsar_cluster.png) | ||
|
||
### Create Data Target | ||
|
||
![airflow_create_data_target](img/pulsar_mysql/airflow/airflow_create_data_target.png) | ||
|
||
Execute the following SQL statement: | ||
|
||
```mysql | ||
CREATE TABLE sink_table ( | ||
id INT AUTO_INCREMENT PRIMARY KEY, | ||
name VARCHAR(255) NOT NULL, | ||
create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP | ||
); | ||
``` | ||
|
||
## Airflow Initialization | ||
|
||
### Get Initial DAG | ||
|
||
They can be obtained from [Inlong](https://github.com/apache/inlong). | ||
|
||
![airflow_get_DAGs](img/pulsar_mysql/airflow/airflow_get_DAGs.jpg) | ||
|
||
> Airflow does not provide an API for DAG creation, so two original DAGs are required. `dag_creator` is used to create offline tasks, and `dag_cleaner` is used to clean up offline tasks regularly. | ||
### Create Initial DAG | ||
|
||
Place the DAG file in the Airflow default DAG directory and wait for a while. The Airflow scheduler will scan the directory and load the DAG: | ||
![airflow_original_DAG](img/pulsar_mysql/airflow/airflow_original_DAG.png) | ||
|
||
### Airflow REST API | ||
|
||
By default, Airflow will reject all REST API requests. Please refer to the [Airflow official documentation](https://airflow.apache.org/docs/apache-airflow-providers-fab/stable/auth-manager/api-authentication.html) for configuration. | ||
|
||
### Inlong Manager Configuration | ||
|
||
Modify the configuration file according to the configuration file requirements and restart Inlong Manager. | ||
```properties | ||
# Inlong Manager URL accessible by the scheduler | ||
schedule.engine.inlong.manager.url=http://inlongManagerIp:inlongManagerPort | ||
# Management URL for Airflow | ||
schedule.engine.airflow.baseUrl=http://airflowIP:airflowPort | ||
# Username and password for Airflow REST API authentication | ||
schedule.engine.airflow.username=airflow | ||
schedule.engine.airflow.password=airflow | ||
# Connection used to save Inlong Manager authentication information | ||
schedule.engine.airflow.connection.id=inlong_connection | ||
# The ids of the two original DAGs | ||
schedule.engine.airflow.cleaner.id=dag_cleaner | ||
schedule.engine.airflow.creator.id=dag_creator | ||
``` | ||
|
||
## Offline Synchronization Task Creation | ||
### Create Synchronization Task | ||
|
||
![airflow_create_synchronization_task](img/pulsar_mysql/airflow/airflow_create_synchronization_task.png) | ||
### Create Data Stream Group | ||
![airflow_data_stream_group](img/pulsar_mysql/airflow/airflow_data_stream_group.png) | ||
|
||
Please refer to the following steps: [Use Quartz built-in scheduling engine](./quartz_example.md) | ||
### Create Airflow Offline Task | ||
|
||
After approval and configuration, Inlong Manager will trigger `dag_creator` through the Airflow API to create the offline task DAG: | ||
|
||
![airflow_create_task_DAG.png](img/pulsar_mysql/airflow/airflow_create_task_DAG.png) | ||
|
||
![airflow_create_task_DAG_result.png](img/pulsar_mysql/airflow/airflow_create_task_DAG_result.png) | ||
|
||
> Offline task DAG may not be scheduled immediately, because Airflow will scan DAG files regularly and add them to the schedule, so it may take some time. | ||
The offline task execution results are as follows: | ||
|
||
![airflow_DAG_task_result.png](img/pulsar_mysql/airflow/airflow_DAG_task_result.png) | ||
|
||
> Airflow will periodically call the interface provided by Inlong Manager to submit Flink tasks according to the configuration in the `Create Data Stream Group` section. This is why the authentication information of Inlong Manager needs to be saved in the `Inlong Manager Configuration` section. | ||
## Test Data | ||
### Send Data | ||
|
||
The example of using Pulsar SDK to write production data to a Pulsar topic is as follows: | ||
```java | ||
// Create pulsar client and producer | ||
PulsarClient pulsarClient = PulsarClient.builder().serviceUrl("pulsar://localhost:6650").build(); | ||
Producer<byte[]> producer = pulsarClient.newProducer().topic("public/default/test").create(); | ||
|
||
// Send a message | ||
for (int i = 0; i < 10000; i++) { | ||
// The field separator is | | ||
String msgStr = i + "|msg-" + i; | ||
MessageId msgId = producer.send(msgStr.getBytes(StandardCharsets.UTF_8)); | ||
System.out.println("Send msg : " + msgStr + " with msgId: " + msgId); | ||
} | ||
``` | ||
|
||
### Data Verification | ||
|
||
Then enter Mysql and check the database table data. You can see that the data has been synchronized to MySQL. | ||
|
||
![airflow_synchronization_result](img/pulsar_mysql/airflow/airflow_synchronization_result.png) |
Binary file added
BIN
+48.2 KB
...ck_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_DAG_task_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+65.5 KB
...rt/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_cluster_labels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+38.5 KB
...t/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_data_flow_group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+72.5 KB
...start/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_data_target.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+102 KB
...rt/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_pulsar_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+57.5 KB
...line_data_sync/img/pulsar_mysql/airflow/airflow_create_synchronization_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+64.5 KB
.../offline_data_sync/img/pulsar_mysql/airflow/airflow_create_synchronize_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+29.9 KB
...ck_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_task_DAG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+55.1 KB
...t/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_task_DAG_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+37.3 KB
..._start/offline_data_sync/img/pulsar_mysql/airflow/airflow_data_stream_group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+19.3 KB
docs/quick_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_get_DAGs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+44.7 KB
...quick_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_original_DAG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+70 KB
...t/offline_data_sync/img/pulsar_mysql/airflow/airflow_synchronization_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
130 changes: 130 additions & 0 deletions
130
...tent-docs/current/quick_start/offline_data_sync/airflow_pulsar_mysql_example.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
--- | ||
title: Airflow 离线同步示例 | ||
sidebar_position: 3 | ||
--- | ||
在下面的内容中,将通过一个完整的示例介绍如何使用 Apache InLong 创建 Airflow 调度任务,并完成 Pulsar -> MySQL 的离线数据同步。 | ||
|
||
## 环境部署 | ||
### 安装 InLong | ||
|
||
在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式: | ||
- [Docker 部署](deployment/docker.md)(推荐) | ||
- [Bare Metal 部署](deployment/bare_metal.md) | ||
|
||
### 添加 Connectors | ||
|
||
下载与 Flink 版本对应的 [connectors](https://inlong.apache.org/zh-CN/downloads) ,解压后将 `sort-connector-jdbc-[version]-SNAPSHOT.jar` 放在 `/inlong-sort/connectors/` 目录下。 | ||
> 当前 Apache InLong 的离线数据同步能力只支持 Flink-1.18 版本,所以请下载 1.18 版本的 connectors。 | ||
## 创建集群和数据目标 | ||
|
||
### 创建集群标签 | ||
![airflow_create_cluster_labels](img/pulsar_mysql/airflow/airflow_create_cluster_labels.png) | ||
|
||
### 注册 Pulsar 集群 | ||
|
||
![airflow_create_pulsar_cluster](img/pulsar_mysql/airflow/airflow_create_pulsar_cluster.png) | ||
|
||
### 创建数据目标 | ||
|
||
![airflow_create_data_target](img/pulsar_mysql/airflow/airflow_create_data_target.png) | ||
|
||
执行如下 Sql 语句: | ||
|
||
```mysql | ||
CREATE TABLE sink_table ( | ||
id INT AUTO_INCREMENT PRIMARY KEY, | ||
name VARCHAR(255) NOT NULL, | ||
create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP | ||
); | ||
``` | ||
|
||
## Airflow 初始化 | ||
|
||
### 获取初始 DAG | ||
|
||
`dag_creator` 和 `dag_cleaner` 可以在 [Inlong](https://github.com/apache/inlong) 获取。 | ||
|
||
![airflow_get_DAGs](img/pulsar_mysql/airflow/airflow_get_DAGs.jpg) | ||
|
||
> Airflow 没有提供 DAG 创建的提供 API ,因此需要两个原始 DAG。`dag_creator` 用于创建离线任务,`dag_cleaner` 用于定时去清理离线任务。 | ||
### 创建初始 DAG | ||
|
||
首先将 DAG 文件放到 Airflow 默认的 DAG 目录下面,等待一段时间,Airflow 调度器会去扫描该目录,并加载 DAG : | ||
|
||
![airflow_original_DAG](img/pulsar_mysql/airflow/airflow_original_DAG.png) | ||
|
||
### Airflow REST API | ||
|
||
默认情况下,Airflow 会拒绝所有 REST API 请求。请参考 [Airflow 官方文档](https://airflow.apache.org/docs/apache-airflow-providers-fab/stable/auth-manager/api-authentication.html) 进行配置。 | ||
|
||
### Inlong Manager 配置 | ||
|
||
根据配置文件要求,对配置文件进行修改,并重启 Inlong Manager 。 | ||
|
||
```properties | ||
# Airflow 能够访问到的 Inlong Manager 的 url | ||
schedule.engine.inlong.manager.url=http://inlongManagerIp:inlongManagerPort | ||
# Airflow 管理页面的 URL | ||
schedule.engine.airflow.baseUrl=http://airflowIP:airflowPort | ||
# 用于 Airflow 的 REST API 认证的用户名和密码 | ||
schedule.engine.airflow.username=airflow | ||
schedule.engine.airflow.password=airflow | ||
# 用来保存 Inlong Manager 认证信息的 Connection | ||
schedule.engine.airflow.connection.id=inlong_connection | ||
# 两个原始 DAG 的 id | ||
schedule.engine.airflow.cleaner.id=dag_cleaner | ||
schedule.engine.airflow.creator.id=dag_creator | ||
``` | ||
|
||
## 离线同步任务创建 | ||
|
||
### 创建同步任务 | ||
|
||
![airflow_create_ynchronization_task](img/pulsar_mysql/airflow/airflow_create_ynchronization_task.png) | ||
|
||
### 创建数据流组 | ||
![airflow_data_stream_group](img/pulsar_mysql/airflow/airflow_data_stream_group.png) | ||
|
||
后续步骤请参照: [使用内置的 Quartz 调度引擎](./quartz_example.md) | ||
|
||
### 创建 Airflow 离线任务 | ||
|
||
审批并配置成功后,Inlong Manager 会去通过 Airflow API 触发 `dag_creator` 去创建离线任务 DAG : | ||
|
||
![airflow_create_task_DAG.png](img/pulsar_mysql/airflow/airflow_create_task_DAG.png) | ||
|
||
![airflow_create_task_DAG_result.png](img/pulsar_mysql/airflow/airflow_create_task_DAG_result.png) | ||
|
||
>离线任务 DAG 可能不会立即进行调度,因为 Airflow 会定期去扫描 DAG 文件,再将其加入调度中,所以可能需要等待一段时间。 | ||
离线任务执行结果如下: | ||
|
||
![airflow_DAG_task_result.png](img/pulsar_mysql/airflow/airflow_DAG_task_result.png) | ||
|
||
> Airflow 会根据`创建数据流组`小节中的配置,定期去调用 Inlong Manager 所提供的接口进行 Flink 任务的提交,这里也是为什么在 `Inlong Manager 配置` 小节中需要保存 Inlong Manager 的认证信息。 | ||
## 测试数据 | ||
### 发送数据 | ||
|
||
通过 Pulsar SDK 生产数据写入的 Pulsar topic 中,示例如下: | ||
```java | ||
// 创建 pulsar client 和 producer | ||
PulsarClient pulsarClient = PulsarClient.builder().serviceUrl("pulsar://localhost:6650").build(); | ||
Producer<byte[]> producer = pulsarClient.newProducer().topic("public/default/test").create(); | ||
|
||
// 发送消息 | ||
for (int i = 0; i < 10000; i++) { | ||
// 字段分隔符为 | | ||
String msgStr = i + "|msg-" + i; | ||
MessageId msgId = producer.send(msgStr.getBytes(StandardCharsets.UTF_8)); | ||
System.out.println("Send msg : " + msgStr + " with msgId: " + msgId); | ||
} | ||
``` | ||
|
||
### 数据验证 | ||
|
||
然后进入 Mysql,查看库表数据,可以看到数据已经同步到 MySQL 中。 | ||
|
||
![airflow_synchronization_result](img/pulsar_mysql/airflow/airflow_synchronization_result.png) |
File renamed without changes.
Binary file added
BIN
+48.2 KB
...ck_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_DAG_task_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+78.2 KB
...rt/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_cluster_labels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+38.5 KB
...t/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_data_flow_group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+73.5 KB
...start/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_data_target.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+106 KB
...rt/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_pulsar_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+64.5 KB
.../offline_data_sync/img/pulsar_mysql/airflow/airflow_create_synchronize_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+29.9 KB
...ck_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_task_DAG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+55.1 KB
...t/offline_data_sync/img/pulsar_mysql/airflow/airflow_create_task_DAG_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+64.5 KB
...fline_data_sync/img/pulsar_mysql/airflow/airflow_create_ynchronization_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+38.5 KB
..._start/offline_data_sync/img/pulsar_mysql/airflow/airflow_data_stream_group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+19.3 KB
...ent/quick_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_get_DAGs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+44.7 KB
...quick_start/offline_data_sync/img/pulsar_mysql/airflow/airflow_original_DAG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+70 KB
...t/offline_data_sync/img/pulsar_mysql/airflow/airflow_synchronization_result.png
Oops, something went wrong.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added
BIN
+29.2 KB
...n-2.0.0/quick_start/offline_data_sync/img/pulsar_mysql/schedule_engine_type.png
Oops, something went wrong.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Oops, something went wrong.