You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Ray Runner currently works by topologically sorting the pipeline graph, and executing stage by stage until the whole pipeline has been executed. This means that it only supports batch mode, and it can't execute multiple stages in parallel.
By implementing watermark-based scheduling, and by executing any bundle that is ready for execution, we can start gaining parallelism, and move towards streaming support.
The Ray Runner currently works by topologically sorting the pipeline graph, and executing stage by stage until the whole pipeline has been executed. This means that it only supports batch mode, and it can't execute multiple stages in parallel.
By implementing watermark-based scheduling, and by executing any bundle that is ready for execution, we can start gaining parallelism, and move towards streaming support.
This work is somewhat involved, because it requires changing the whole execution logic for the pipeline, however it should increase our parallelism, which will be great (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py#L420-L487)
The text was updated successfully, but these errors were encountered: