Skip to content

Python Batch Runner: Beam Portability API Integration

Past due by over 2 years 50% complete

This milestone tracks all work required to run batch pipelines on Ray through integration with the Beam portability API: https://beam.apache.org/roadmap/portability/.

The Beam Portability API provides several advantages vs. the current BundleBasedDirectRunner implementation, including:

  1. Improved cross-language portability posture due to the use of languag…

This milestone tracks all work required to run batch pipelines on Ray through integration with the Beam portability API: https://beam.apache.org/roadmap/portability/.

The Beam Portability API provides several advantages vs. the current BundleBasedDirectRunner implementation, including:

  1. Improved cross-language portability posture due to the use of language-agnostic constructs between the SDK and runner.
  2. Improved performance via built-in pipeline execution graph optimizers.
  3. Reduced maintainability burden via more future proof APIs and reusable components like execution graph schedulers and state managers.

Successful completion of this milestone will allow users to:

  1. Define and run any valid Python Beam batch pipeline either locally or on a remote cluster using Ray.
  2. Integrate Beam batch pipelines with existing machine learning and data science applications on Ray (e.g. via TFX, Beam Pandas DataFrames, etc.).
Loading