You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm proposing that support running rayfed job in single-controller mode.
I'd like to propose 2 options on how we startup the single-controller cluster and how we connect to the cluster and run our jobs.
option 1
Add a new cli toolkit to start the cluster, it just wrapper the ray cli toolkit, for example:
A. running single-controller mode
> rayfed start --head --mode=single-controller --party=ALICE # node1, listening on 1.2.3.4:5555> rayfed start --address="1.2.3.4:5555" --party=ALICE # node2, connecting to the node1> rayfed start --address="1.2.3.4:5555" --party=BOB # node3, connecting to the node1> rayfed start --address="1.2.3.4:5555" --party=BOB # node4, connecting to the node1
And then, the job could be run in single controller mode automatically:
# main.pyfed.init(address="1.2.3.4:5555", xxx)
# Nothing need to be changed in this job script.
B. running multiple-controller mode
> rayfed start --head --mode=multiple-controller --party=ALICE # node1, listening on 1.2.3.4:5555> rayfed start --address="1.2.3.4:5555" --party=ALICE # node2, connecting to the node1> rayfed start --head --mode=multiple-controller --party=BOB # node3, listening on 5.6.7.8:6666> rayfed start --address="5.6.7.8:6666" --party=BOB # node4, connecting to the node3
And then, you run the following script in 2 clusters:
# main.pyfed.init(address="1.2.3.4:5555", xxx)
# nothing need to be changed in this job script.
# in node2> python main.py --party=ALICE
# in node3> python main.py --party=BOB
option 2
No need to add a new toolkit, but we should tell users that add some extra arguments when starting up the Ray cluster.
For example,
A. running single-controller mode
> ray start --head --resources={"_PARTY_ALICE", 9999} # node1, listening on 1.2.3.4:5555> ray start --address="1.2.3.4:5555" --resources={"_PARTY_ALICE", 9999} # node2, connecting to the node1> ray start --address="1.2.3.4:5555" --resources={"_PARTY_BOB", 9999} # node3, connecting to the node1> ray start --address="1.2.3.4:5555" --resources={"_PARTY_BOB", 9999} # node4, connecting to the node1
And then, add the extra mode info when fed.init():
# main.pyfed.init(address="1.2.3.4:5555", mode="single-controller", xxx)
# Nothing need to be changed in this job script.
A. running multiple-controller mode
> ray start --head # node1, listening on 1.2.3.4:5555> ray start --address="1.2.3.4:5555"# node2, connecting to the node1> ray start --head # node3, listening on 5.6.7.8:6666> ray start --address="5.6.7.8:6666"# node4, connecting to the node3
And then, add the extra mode info when fed.init()(And we could ignore it if we provide a default value):
# main.pyfed.init(address="1.2.3.4:5555", mode="multiple-controller", xxx)
# Nothing need to be changed in this job script.
The text was updated successfully, but these errors were encountered:
jovany-wang
changed the title
Support running rayfed in single controller mode.
Support running rayfed job in single-controller mode.
Feb 21, 2023
jovany-wang
changed the title
Support running rayfed job in single-controller mode.
[RFC] Support running rayfed job in single-controller mode.
Feb 24, 2023
jovany-wang
changed the title
[RFC] Support running rayfed job in single-controller mode.
[RFC] Support running in single-controller mode.
Feb 24, 2023
For SIMULATION mode, we might run actors in different parties in one process(one Ray worker process).
For SINGLE_CONTROLLER mode, actors in different parties should be run in different Ray nodes.
For MULTI_CONTROLLER mode, actors in different parties should be run in different Ray clusters.
I'm proposing that support running rayfed job in single-controller mode.
I'd like to propose 2 options on how we startup the single-controller cluster and how we connect to the cluster and run our jobs.
option 1
Add a new cli toolkit to start the cluster, it just wrapper the ray cli toolkit, for example:
A. running single-controller mode
And then, the job could be run in single controller mode automatically:
B. running multiple-controller mode
And then, you run the following script in 2 clusters:
option 2
No need to add a new toolkit, but we should tell users that add some extra arguments when starting up the Ray cluster.
For example,
A. running single-controller mode
And then, add the extra mode info when
fed.init()
:A. running multiple-controller mode
And then, add the extra mode info when
fed.init()
(And we could ignore it if we provide a default value):The text was updated successfully, but these errors were encountered: