Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-906][DOC] Add MySQL to StarRocks example document for data sync #907

Merged
merged 7 commits into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
122 changes: 122 additions & 0 deletions docs/quick_start/data_sync/mysql_starrocks_example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
title: MySQL to StarRocks Example
sidebar_position: 2
---

Here we use an example to introduce how to use Apache InLong creating `MySQL -> StarRocks` data synchronization.

## Deployment
### Install InLong

Before we begin, we need to install InLong. Here we provide two ways:
- [Docker Deployment](deployment/docker.md) (Recommended)
- [Bare Metal Deployment](deployment/bare_metal.md)

### Add Connectors

Download the [connectors](https://inlong.apache.org/downloads/) corresponding to Flink 1.13, and after decompression, place `sort-connector-starrocks-[version]-SNAPSHOT.jar` in `/inlong-sort/connectors/` directory.

### Install StarRocks
Please refer to the [Installation Tutorial](https://docs.starrocks.io/docs/quick_start/) on the Apache StarRocks official website

## Cluster Initialize
When all containers are successfully started, you can access the InLong dashboard address http://localhost, and use the following default account to log in.
```
User: admin
Password: inlong
```

### Create Cluster Tag
Click [Clusters] -> [ClusterTags] -> [Create] on the page to specify the cluster label name and responsible person.
![Create Cluster Tag](img/mysql_starrocks/create_cluster_tag.png)

:::caution
`default_cluster` is the default ClusterTags reported by each component. If you decide to use a different name, make sure to update the corresponding tag configuration accordingly.
:::

### Register Pulsar Cluster
Click [Clusters] -> [Cluster] -> [Create] on the page to register Pulsar Cluster.
![Create Pulsar Cluster](img/mysql_starrocks/create_pulsar_cluster.png)


:::note
The ClusterTags selects the newly created `default_cluster`, the Pulsar cluster deployed by docker:

Service URL is `pulsar://pulsar:6650`, Admin URL is `http://pulsar:8080`.
:::

### Register StarRocks DataNodes
Click [DataNodes] -> [Create] on the page to register StarRocks DataNodes.
![Create StarRocks DataNode](img/mysql_starrocks/create_starrocks_datanode.png)

:::note
- Please do not fill in `http://` for LOAD URL, just fill in `IP:PORT`.
:::

## Create Task
### Create Data Streams Group
Click [Synchronization] → [Create] on the page and input the Group ID, Stream ID and Full database migration:
![Create Group Stream](img/mysql_starrocks/create_group_stream.png)

### Create Data Source
In the data source, click [New] → [MySQL] to configure the source name, address, databases and tables information.
![Create Stream_Source](img/mysql_starrocks/create_source.png)

:::note
- When the read mode is selected as `Full + Incremental`, the existing data in the table will also be collected, but not in the `Incremental` mode.
- The table white list format is `<dbName>.<tableName>` and supports regular expressions.
:::

### Create Data Sink
In the data sink, click [New] → [StarRocks] to configure the sink name, database name, table name and created StarRocks data node.

![Create data object](img/mysql_starrocks/create_sink.png)

### Approve Data Stream
Click [Approval] -> [MyApproval] -> [Approval] -> [Ok].

![Approve](img/mysql_starrocks/approve.png)

Back to [Synchronization] page, wait for [success].

![Success](img/mysql_starrocks/success.png)

## Test Data
### Send Data
```
#!/bin/bash

# MySQL info
DB_HOST="mysql"
DB_USER="root"
DB_PASS="inlong"
DB_NAME="test"
DB_TABLE="source_table"

# Insert data in a loop
for ((i=1; i<=1000; i++))
do
# Generate data
id=$i
name="name_$i"

# Build an insert SQL
query="INSERT INTO $DB_TABLE (id, name) VALUES ($id, '$name');"

# Execute insert SQL
mysql -h $DB_HOST -u $DB_USER -p$DB_PASS $DB_NAME -e "$query"
done
```

Modify the variables in the script according to the actual environment, and add a total of 1000 pieces of data to the `source_table`:

![Result Source](img/mysql_starrocks/result_source.png)

### Verify Data
Enter StarRocks, check data in `sink_table`.

![Result Sink](img/mysql_starrocks/result_sink.png)

You can also view audit data on the page:

![Result Sink](img/mysql_starrocks/audit_starrocks.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: MySQL 到 StarRocks 示例
sidebar_position: 2
---

在下面的内容中,我们将通过一个完整的示例介绍如何使用 Apache InLong 创建 MySQL -> StarRocks 数据同步。

## 环境部署
### 安装 InLong

在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式:
- [Docker 部署](deployment/docker.md)(推荐)
- [Bare Metal 部署](deployment/bare_metal.md)

### 添加 Connectors

下载 Flink 1.13 对应版本的 [connectors](https://inlong.apache.org/zh-CN/downloads),解压后将 `sort-connector-starrocks-[version]-SNAPSHOT.jar` 放在 `/inlong-sort/connectors/` 目录下。

### 安装 StarRocks
请参考 Apache StarRocks 官网的[安装教程](https://docs.starrocks.io/docs/quick_start/)。

## 集群初始化
容器启动成功后,访问 InLong Dashboard 地址 http://localhost,并使用以下默认账号登录:
```
User: admin
Password: inlong
```

### 创建集群标签
页面点击 【集群管理】->【标签管理】->【新建】,指定集群标签名称和负责人:
![Create Cluster Tag](img/mysql_starrocks/create_cluster_tag.png)

**注:default_cluster 是各个组件默认上报集群标签,如果使用其它名称,确认对应标签配置已修改。**

### 注册 Pulsar 集群
页面点击 【集群管理】 -> 【集群管理】 -> 【新建集群】,注册 Pulsar 集群:
![Create Pulsar Cluster](img/mysql_starrocks/create_pulsar_cluster.png)

:::note
集群标签选择刚创建的 `default_cluster`,配置 Docker 部署的 Pulsar 集群:

Service URL 为 `pulsar://pulsar:6650`, Admin URL 为 `http://pulsar:8080`.
:::

### 注册 StarRocks 数据节点
页面点击 【数据节点】 -> 【创建】 ,新增 StarRocks 数据节点.
![Create StarRocks DataNode](img/mysql_starrocks/create_starrocks_datanode.png)

:::note
- LOAD URL 请勿携带 `http://`, 填写 IP + 端口即可。
:::

## 任务创建
### 新建数据流组
页面点击【数据同步】 → 【创建】,输入 Group ID、Steam ID 和 是否整库迁移:
![Create Group Stream](img/mysql_starrocks/create_group_stream.png)

### 创建数据源
数据源中点击 【新建】 → 【MySQL】 配置数据源名称、地址、库表信息等。
![Create Stream_Source](img/mysql_starrocks/create_source.png)

:::note
- 读取模式选择 `全量+增量` 时,表中的存量数据也会被采集,`仅增量` 模式则不会。
- 表名白名单格式为 `<dbName>.<tableName>`,支持正则表达。
:::

### 创建数据目标
数据目标中点击 【新建】 → 【StarRocks】,设置数据目标名称并选择创建好的 StarRocks 数据节点, 并填写库表名称。
![Create data object](img/mysql_starrocks/create_sink.png)

### 审批数据流
点击 【审批管理】 -> 【我的审批】 -> 【审批】 -> 【通过】.
![Approve](img/mysql_starrocks/approve.png)

返回【数据集成】,等待任务配置成功:
![Success](img/mysql_starrocks/success.png)

## 测试数据
### 发送数据
```
#!/bin/bash

# MySQL info
DB_HOST="mysql"
DB_USER="root"
DB_PASS="inlong"
DB_NAME="test"
DB_TABLE="source_table"

# Insert data in a loop
for ((i=1; i<=1000; i++))
do
# Generate data
id=$i
name="name_$i"

# Build an insert SQL
query="INSERT INTO $DB_TABLE (id, name) VALUES ($id, '$name');"

# Execute insert SQL
mysql -h $DB_HOST -u $DB_USER -p$DB_PASS $DB_NAME -e "$query"
done
```

根据实际环境修改脚本中的变量,执行脚本向 `source_table` 表中累计添加 1000 条数据:

![Result Source](img/mysql_starrocks/result_source.png)

### 验证数据
进入 StarRocks,查看 sink_table 表数据

![Result Sink](img/mysql_starrocks/result_sink.png)

也可以在页面查看审计数据:

![Result Sink](img/mysql_starrocks/audit_starrocks.png)
Loading