Releases: ververica/ForSt
v0.1.2-beta
The latest version 0.1.2-beta
of ForSt has been released, featuring integration with the 2.0-preview version of Apache Flink. With this integration, ForSt is capable of reading and writing SST files in remote storages, making it an embedded but disaggregated key-value store and a significant step towards cloud-native architecture.
Features
- Built on RocksDB 8.10.0, incorporating tailored changes for Flink based on FRocksDB.
- Introducing new file read and write functionalities using Flink's file system abstraction & implementation.
- Multi-platform JNI release.
Performance
ForSt leverages the asynchronous execution model in Flink 2.0, addressing the performance challenges associated with reading and writing disaggregated remote storage. To evaluate performance, two benchmarks were conducted comparing Flink with ForSt to FRocksDB (the RocksDBStateBackend in Flink):
Word Count
This benchmark involves counting word occurrences in a data stream, representing a typical Read-Modify-Write state access pattern. The throughput was monitored via metrics when the state size stabilized after 12 hours.
Setup
- Unique word count: 300M
- Each word length: 16 bytes
- Task Manager & Parallelism: 1
- Managed memory per TM: 512M
- Machine: Alibaba Cloud ECS ecs.g6a.16xlarge (64 vCPU 256 GB)
- Local Disk (for FRocksDB): ESSD 6800 IOPS
- HDFS (for ForSt): On LAN, built on ESSD 6800 IOPS x 4, 2 replica
Throughput
ForSt on pure remote storage could achieve 83% of the throughput of RocksDBStateBackend
in Flink 2.0.
StateBackend | RocksDB | ForSt |
---|---|---|
Throughput(/s) | 19.2K | 16.0K |
Nexmark Query 20
The Nexmark Query 20 benchmark involves a filter join, which is a common case in stream processing. The throughput was calculated as event num / time
.
Setup
- Nexmark events: 200M
- Task Manager & Parallelism: 8
- Managed memory per TM: 512M
- Machine: Alibaba Cloud ECS ecs.g6a.16xlarge (64 vCPU 256 GB) x 4
- Local Disk (for FRocksDB): ESSD 6800 IOPS
- HDFS (for ForSt): On LAN, built on ESSD 6800 IOPS x 4, 2 replica
Throughput
ForSt on pure remote storage could achieve 64% of the throughput of RocksDBStateBackend
in Flink 2.0.
StateBackend | RocksDB | ForSt |
---|---|---|
Throughput(/s) | 319K | 206K |
Note: The state size grows and the throughput keeps going down during this benchmark. In the first part of the processing the state is almost stored in block cache
or memtable
, so it cannot accurately reflect the performance of disaggregated state access in stream processing. A new running mode of Nexmark will be proposed to address this.