Skip to content

Releases: ververica/ForSt

v0.1.2-beta

23 Oct 07:53
Compare
Choose a tag to compare

The latest version 0.1.2-beta of ForSt has been released, featuring integration with the 2.0-preview version of Apache Flink. With this integration, ForSt is capable of reading and writing SST files in remote storages, making it an embedded but disaggregated key-value store and a significant step towards cloud-native architecture.

Features

  • Built on RocksDB 8.10.0, incorporating tailored changes for Flink based on FRocksDB.
  • Introducing new file read and write functionalities using Flink's file system abstraction & implementation.
  • Multi-platform JNI release.

Performance

ForSt leverages the asynchronous execution model in Flink 2.0, addressing the performance challenges associated with reading and writing disaggregated remote storage. To evaluate performance, two benchmarks were conducted comparing Flink with ForSt to FRocksDB (the RocksDBStateBackend in Flink):

Word Count

This benchmark involves counting word occurrences in a data stream, representing a typical Read-Modify-Write state access pattern. The throughput was monitored via metrics when the state size stabilized after 12 hours.

Setup

  • Unique word count: 300M
  • Each word length: 16 bytes
  • Task Manager & Parallelism: 1
  • Managed memory per TM: 512M
  • Machine: Alibaba Cloud ECS ecs.g6a.16xlarge (64 vCPU 256 GB)
  • Local Disk (for FRocksDB): ESSD 6800 IOPS
  • HDFS (for ForSt): On LAN, built on ESSD 6800 IOPS x 4, 2 replica

Throughput

ForSt on pure remote storage could achieve 83% of the throughput of RocksDBStateBackend in Flink 2.0.

StateBackend RocksDB ForSt
Throughput(/s) 19.2K 16.0K

Nexmark Query 20

The Nexmark Query 20 benchmark involves a filter join, which is a common case in stream processing. The throughput was calculated as event num / time.

Setup

  • Nexmark events: 200M
  • Task Manager & Parallelism: 8
  • Managed memory per TM: 512M
  • Machine: Alibaba Cloud ECS ecs.g6a.16xlarge (64 vCPU 256 GB) x 4
  • Local Disk (for FRocksDB): ESSD 6800 IOPS
  • HDFS (for ForSt): On LAN, built on ESSD 6800 IOPS x 4, 2 replica

Throughput

ForSt on pure remote storage could achieve 64% of the throughput of RocksDBStateBackend in Flink 2.0.

StateBackend RocksDB ForSt
Throughput(/s) 319K 206K

Note: The state size grows and the throughput keeps going down during this benchmark. In the first part of the processing the state is almost stored in block cache or memtable, so it cannot accurately reflect the performance of disaggregated state access in stream processing. A new running mode of Nexmark will be proposed to address this.