Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-1096][Release] Add blog for the 2.1.0 release #1099

Merged
merged 6 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions blog/2025-01-03-release-2.1.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
title: Release 2.1.0
author: justinwwhuang
author_url: https://github.com/justinwwhuang
author_image_url: https://avatars.githubusercontent.com/justinwwhuang
tags: [Apache InLong, Version]
---

Apache InLong has recently released version 2.1.0, which has closed over 120 issue, including more than 4 major features and over 110 optimizations.
The main accomplishments include Dashboard supports batch operation of nodes, Manager supports multiple scheduling engines, Agent supports COS data sources,
Sort supports archiving dirty data through the InLong SDK. Simultaneously optimize the user experience of Apache InLong operations and maintenance. In Apache InLong 2.1.0 version, a large number of other features have also been completed.
<!--truncate-->

## About Apache InLong
As the industry's first one-stop, all-scenario massive data integration framework, Apache InLong provides automated, secure, reliable,
and high-performance data transmission capabilities, enabling businesses to quickly build stream-based data analysis, modeling, and applications.
Currently, InLong is widely used in various industries including advertising, payment, social networking, gaming, and artificial intelligence,
serving thousands of businesses, with high-performance scenarios processing over hundreds of billions of records per day and highly reliable scenarios
handling over tens of trillions of records per day.

The core keywords for InLong's project positioning are "one-stop," "all-scenario," and "massive data." For "one-stop,"
we aim to shield technical details, provide complete data integration and supporting services, and achieve out-of-the-box usability;
for "all-scenario," we aim to offer comprehensive solutions covering common data integration scenarios in the big data field;
for "massive data," we hope to leverage architectural advantages such as layered data links, fully extensible components,
and built-in multi-cluster management to stably support even larger data volumes based on hundreds of billions of records per day.

## 2.1.0 Overview

Apache InLong has recently released version 2.1.0, which has closed over 120 issue, including more than 4 major features and over 110 optimizations.
The main accomplishments include
- Dashboard supports batch operation of nodes
- Manager supports multiple scheduling engines
- Agent supports COS data sources
- Sort supports archiving dirty data through the InLong SDK.

Simultaneously optimize the user experience of Apache InLong operations and maintenance. In Apache InLong 2.1.0 version, a large number of other features have also been completed.

### Dashboard Module
- Support COS data source
- Support batch operation of agents: restart, upgrade
- Support exporting audit data as CSV files
- Support sorting of audit data and comparison of differences
- Support queries for all types of indicators
- Support data preview field segmentation
### Manager Module
- Support COS data source
- Support managing multiple scheduling engines: AirFlow、DolphinScheduler
- Support dirty data management and querying
- Support querying heartbeat information based on IP
- Limit one IP to only belong to one cluster
- Provide an API for querying of dirty data archiving
### Agent Module
- Support COS data source
- Support quick startup and shutdown
- Support starting multiple instances
- Support data supplementation in chronological order
- Optimize the logic of the Installer process guardian for Agent
- Support supplementary recording based on local data time
### Sort Module
- Added Elasticsearch connector based on Flink 1.18
- Support KV separation on Kafka Sink
- Support audit data reporting
- Tube Connector source supports dirty data archiving
### SDK Module
- Transform SDK adds 7 new functions
- Add Dirty Data Archiving SDK
### Audit Module
- Audit Proxy increases metric reporting
- Audit Store adds metric reporting
- Audit Service increases metric reporting
- Add asynchronous flush audit data interface
### TubeMQ Module
- Write the consumption offset information to a local file
- Optimize the load balancing logic of the Go version SDK
### Others
- Pipeline supports parallel build
- Support Manager to configure volumes
## 2.1.0 Feature Introduction
### Dashboard supports batch operation of agents
This feature is mainly used for the operation of Inlong Agent: mainly for upgrading and restarting:
- After finding the cluster in cluster management, select multiple nodes to operate on and click on batch operation.

![2.1.0-dashboard-select.png](img/2.1.0/2.1.0-dashboard-select.png)

- Select the operation type and fill in the required parameters for the corresponding operation, then click OK.

![2.1.0-dashboard-operate.png](img/2.1.0/2.1.0-dashboard-operate.png)

This feature optimizes the operation and maintenance experience of Inlong: interface based operation eliminates the need to operate DB and increases the cohesion of Inlong:
- Visual Agent version upgrade, which can be upgraded in batches and at regular intervals to control upgrade risks.
- During agent fault recovery, this function can be used to quickly restart.

Thanks to @[wohainilaodou](https://github.com/wohainilaodou) for their contributions to this feature. For more details, please refer to [INLONG-11187](https://github.com/apache/inlong/issues/11187)
### Manager supports multiple scheduling engines
Previously, for offline data synchronization, Inlong only supported Quartz scheduling engine. This version has added two third-party engines: DolphinScheduler and AirFlow.
#### AirFlow engine
- In order to facilitate the maintenance and expansion of AirFlow interface support in the future, AirflowApi interface and BaseAirflowApi abstract class have been designed, and subsequent extensions only need to be based on this foundation.
- Implement a unified request class AirflowServerClient for the interface.
- Add two interceptors in OkHttpClient: AirflowAuthInterceptor for unified authorization of interfaces; LoggingInterceptor is used for logging.

Thanks to @[Zkplo](https://github.com/Zkplo) for their contributions to this feature. For more details, please refer to [INLONG-11400](https://github.com/apache/inlong/issues/11400)
#### DolphinScheduler engine
-Add the DolphinScheduler package to org.apache.inlong.manager.schedule
-Add client and engine for DS, as well as util for operating open APIs for DS
-Add pojo class for DS interaction

Thanks to @[emptyOVO](https://github.com/emptyOVO) for their contributions to this feature. For more details, please refer to [INLONG-11401](https://github.com/apache/inlong/issues/11401)
### Agent supports COS data source
- Create a new COS type node and fill in the corresponding bucket name, credential ID, credential key, and region.

![2.1.0-agent-node.png](img/2.1.0/2.1.0-agent-node.png)

- Create a new COS type data source, select the corresponding node, IP, and file path.

![2.1.0-agent-type.png](img/2.1.0/2.1.0-agent-type.png)

![2.1.0-agent-param.png](img/2.1.0/2.1.0-agent-param.png)

This feature supports direct data collection from COS storage, and businesses do not need to download COS files locally for data collection.
Thanks to @[justinwwhuang](https://github.com/justinwwhuang) for their contributions to this feature. For more details, please refer to [INLONG-11187](https://github.com/apache/inlong/issues/11187)

### Sort supports archiving dirty data through the InLong SDK.
Added the ability to report dirty data to specified GroupId and StreamId through the InLong SDK. Users can choose to export dirty data or consume it independently from Pulsar.

![2.1.0-sort-dirty.png](img/2.1.0/2.1.0-sort-dirty.png)

The following configuration needs to be added to the Connector:
```
'dirty.side-output.inlong-sdk.inlong-auth-key' = 'your auth key',

'dirty.side-output.inlong-sdk.inlong-auth-id' = 'your auth id',

'dirty.side-output.enable' = 'true',

'dirty.side-output.inlong-sdk.inlong-group-id' = 'target_inlong_group_id',
'dirty.side-output.inlong-sdk.inlong-stream-id' = 'target_inlong_stream_id',

'dirty.side-output.labels' = 'groupId=xx&streamId=xx&serverType=tube&dataflowId=xx',

'dirty.side-output.inlong-sdk.inlong-manager-addr' = 'xxx',

'dirty.side-output.connector' = 'inlong-sdk',

'dirty.ignore' = 'true',`
```
Thanks to @[vernedeng](https://github.com/vernedeng) and @[fuweng11](https://github.com/fuweng11) for their contributions to this feature. For more details,
please refer to [INLONG-11481](https://github.com/apache/inlong/issues/11481) and [INLONG-11508](https://github.com/apache/inlong/issues/11508)


## Future Plans
In version 2.1.0, we have enriched and improved our operational capabilities. Welcome everyone to use it. If you have more scenarios and requirements,
or encounter any problems during use, please feel free to raise issues and PR. In future versions, the InLong community will continue to:

- Support more data source collection capabilities

- Enrich Flink 1.15, 1.18 Connector

- Continuously enhance Transform capabilities.

- Provide real-time synchronization support for more data sources and targets.

- Optimize SDK capabilities and user experience

- Optimize Dashboard experience


We also look forward to more developers interested in InLong to contribute and help drive the project's development!
Binary file added blog/img/2.1.0/2.1.0-agent-node.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/img/2.1.0/2.1.0-agent-param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/img/2.1.0/2.1.0-agent-type.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/img/2.1.0/2.1.0-dashboard-operate.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/img/2.1.0/2.1.0-dashboard-select.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/img/2.1.0/2.1.0-sort-dirty.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
151 changes: 151 additions & 0 deletions i18n/zh-CN/docusaurus-plugin-content-blog/2025-01-03-release-2.1.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
---
title: 2.1.0 版本发布
author: justinwwhuang
author_url: https://github.com/justinwwhuang
author_image_url: https://avatars.githubusercontent.com/justinwwhuang
tags: [Apache InLong, Version]
---

Apache InLong(应龙) 最近发布了 2.1.0 版本,该版本关闭了约 120+ 个 issue,包含 4+ 个大特性和 110+ 个优化,主要完成了 Dashboard 支持批量操作节点、Manager 支持多种调度引擎、Agent 支持 COS 数据源、Sort 支持通过 InLong SDK 进行脏数据的归档。同时优化 Apache InLong 运营运维的使用体验。Apache InLong 2.1.0 版本中,还完成了大量其它特性。
<!--truncate-->

## 关于 Apache InLong
作为业界首个一站式、全场景海量数据集成框架,Apache InLong(应龙) 提供了自动、安全、可靠和高性能的数据传输能力,方便业务快速构建基于流式的数据分析、建模和应用。目前 InLong 正广泛应用于广告、支付、社交、游戏、人工智能等各个行业领域,服务上千个业务,其中高性能场景数据规模超百万亿条/天,高可靠场景数据规模超十万亿条/天。
InLong 项目定位的核心关键词是“一站式”、“全场景”和“海量数据”。对于“一站式”,我们希望屏蔽技术细节、提供完整数据集成及配套服务,实现开箱即用;对于“全场景”,我们希望提供全方位的解决方案,覆盖大数据领域常见的数据集成场景;对于“海量数据”,我们希望通过架构上的数据链路分层、全组件可扩展、自带多集群管理等优势,在百万亿条/天的基础上,稳定支持更大规模的数据量。
## 2.1.0 版本总览

Apache InLong(应龙) 最近发布了 2.1.0 版本,该版本关闭了约 120+ 个 issue,包含 4+ 个大特性和 110+ 个优化,主要完成了
- Dashboard 支持批量操作节点
- Manager 支持多种调度引擎
- Agent 支持 COS 数据源
- Sort 支持通过 InLong SDK 进行脏数据的归档。

同时优化 Apache InLong 运营运维的使用体验。Apache InLong 2.1.0 版本中,还完成了大量其它特性。
### Dashboard 模块
- 支持 COS 数据源
- 支持 Agent 批量操作:重启、升级
- 支持审计数据导出成 CSV 文件
- 支持审计数据排序、差值对比
- 支持全量指标类型查询
- 支持数据预览字段分割
### Manager 模块
- 支持 COS 数据源
- 支持管理多种调度引擎:AirFlow、DolphinScheduler
- 支持脏数据管理以及查询
- 支持根据 IP 查询心跳信息
- 限制一个 IP 只能归属到一个集群
- 提供脏数据归档查询 API
### Agent 模块
- 支持 COS 数据源
- 支持快速启动、关闭
- 支持启动多实例
- 支持按时间顺序补录数据
- 优化 Installer 守护 Agent 逻辑
- 支持按照数据本地时间进行补录
### Sort 模块
- 新增基于 Flink 1.18 的 Elasticsearch connector
- 支持 Kafka Sink 端的 KV 分隔
- 支持审计数据上报
- Tube Connector source 支持脏数据归档
### SDK 模块
- Transform SDK 新增 7 种函数
- 新增脏数据归档 SDK
### Audit 模块
- Audit Proxy 增加指标上报
- Audit Store 增加指标上报
- Audit Service 增加指标上报
- 新增异步刷审计数据接口
### TubeMQ 模块
- 将消费位点信息写到本地文件
- 优化 Go 版本 SDK 负载均衡逻辑
### 其他
- 流水线支持并行构建
- 支持管理器配置卷
## 2.1.0 版本主要特性
### Dashboard 支持批量操作节点
改特性主要是用于 Inlong Agent 的操作:主要是升级和重启:
- 在集群管理找到集群后,选择要操作的多个节点,点击 批量操作。

![2.1.0-dashboard-select.png](img/2.1.0/2.1.0-dashboard-select.png)

- 选择操作类型,并填写对应操作所需的参数,点击确定即可。

![2.1.0-dashboard-operate.png](img/2.1.0/2.1.0-dashboard-operate.png)

该功能优化了 Inlong 的运维体验:界面化操作,运维不再需要操作 DB,增加了 Inlong 的内聚性:
- 可视化 Agent 版本升级,可分批、定时升级,控制升级风险。
- Agent 故障恢复时可通过该功能进行快速重启。

感谢 @[wohainilaodou](https://github.com/wohainilaodou) 的贡献,详情参考 [INLONG-11187](https://github.com/apache/inlong/issues/11187)
### Manager 支持多种调度引擎
之前对于离线数据同步 Inlong 只支持 Quartz 调度引擎。这次的版本则增加了两个第三方引擎:DolphinScheudler 和 AirFlow。
#### AirFlow 引擎
- 为了便于未来 AirFlow 接口支持的维护和扩展,设计了 AirflowApi 接口和 BaseAirflowApi 抽象类,后续扩展只需要在此基础上进行。
- 为接口实现统一的请求类 AirflowServerClient。
- 在 OkHttpClient 中添加两个拦截器:AirflowAuthInterceptor 用于接口的统一授权;LoggingInterceptor 用于日志记录。

感谢 @[Zkplo](https://github.com/Zkplo) 的贡献,详情参考 [INLONG-11400](https://github.com/apache/inlong/issues/11400)
#### DolphinScheudler 引擎
- 在 org.apache.inlong.manager.schedule 中添加 DolphinScheudler 包
- 添加 DS 的客户端和引擎,以及用于操作 DS 的开放 API 的 Util
- 为 DS 交互添加 pojo 类

感谢 @[emptyOVO](https://github.com/emptyOVO) 的贡献,详情参考 [INLONG-11401](https://github.com/apache/inlong/issues/11401)
### Agent 支持 COS 数据源
- 新建 COS 类型节点,填写相应的 桶名、凭据 ID、凭据密钥和地区即可。

![2.1.0-agent-node.png](img/2.1.0/2.1.0-agent-node.png)

- 新建 COS 类型数据源,选择相应的节点、IP、文件路径即可。

![2.1.0-agent-type.png](img/2.1.0/2.1.0-agent-type.png)

![2.1.0-agent-param.png](img/2.1.0/2.1.0-agent-param.png)

该功能支持从 COS 存储直接采集数据,业务不需要把 COS 文件下载到本地再做数据采集。
感谢 @[justinwwhuang](https://github.com/justinwwhuang) 的贡献,详情参考 [INLONG-11187](https://github.com/apache/inlong/issues/11187)

### Sort 支持通过 InLong SDK 进行脏数据的归档
新增通过 InLong SDK 上报脏数据至指定 GroupId 和 StreamId 的能力。用户可以选择将脏数据接出,或从 Pulsar 中自主消费。

![2.1.0-sort-dirty.png](img/2.1.0/2.1.0-sort-dirty.png)

需要在 Connector 中增加如下配置
```
'dirty.side-output.inlong-sdk.inlong-auth-key' = 'your auth key',

'dirty.side-output.inlong-sdk.inlong-auth-id' = 'your auth id',

'dirty.side-output.enable' = 'true',

'dirty.side-output.inlong-sdk.inlong-group-id' = 'target_inlong_group_id',
'dirty.side-output.inlong-sdk.inlong-stream-id' = 'target_inlong_stream_id',

'dirty.side-output.labels' = 'groupId=xx&streamId=xx&serverType=tube&dataflowId=xx',

'dirty.side-output.inlong-sdk.inlong-manager-addr' = 'xxx',

'dirty.side-output.connector' = 'inlong-sdk',

'dirty.ignore' = 'true',`
```
感谢 @[vernedeng](https://github.com/vernedeng) 和 @[fuweng11](https://github.com/fuweng11) 的贡献,
详情参考 [INLONG-11481](https://github.com/apache/inlong/issues/11481) 和 [INLONG-11508](https://github.com/apache/inlong/issues/11508)


## 未来规划
在 2.1.0 版本中,我们丰富、完善了运维能力。欢迎大家使用,如果有更多场景和需求,或者使用期间遇到的问题, 欢迎大家提 issue和 PR。在后续的版本中,InLong 社区将继续:

- 支持更多数据源采集能力

- 丰富 Flink 1.15、1.18 Connector

- 丰富 Transform 能力,并且集成到 InLong 的各个模块

- 实时同步支持更多数据源、数据目标

- 优化 SDK 能力和使用体验

- 优化 Dashboard 体验

我们也期待更多对 InLong 感兴趣的开发者可以参与贡献。
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading