Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support running in different modes. #29

Open
jovany-wang opened this issue Feb 1, 2023 · 3 comments
Open

[RFC] Support running in different modes. #29

jovany-wang opened this issue Feb 1, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request p2
Milestone

Comments

@jovany-wang
Copy link
Collaborator

jovany-wang commented Feb 1, 2023

I'm proposing that support running rayfed job in single-controller mode.

I'd like to propose 2 options on how we startup the single-controller cluster and how we connect to the cluster and run our jobs.

option 1

Add a new cli toolkit to start the cluster, it just wrapper the ray cli toolkit, for example:

A. running single-controller mode

> rayfed start --head --mode=single-controller --party=ALICE  # node1, listening on 1.2.3.4:5555
> rayfed start --address="1.2.3.4:5555" --party=ALICE  # node2, connecting to the node1
> rayfed start --address="1.2.3.4:5555" --party=BOB  # node3, connecting to the node1
> rayfed start --address="1.2.3.4:5555" --party=BOB  # node4, connecting to the node1

And then, the job could be run in single controller mode automatically:

# main.py
fed.init(address="1.2.3.4:5555", xxx)
# Nothing need to be changed in this job script.

B. running multiple-controller mode

> rayfed start --head --mode=multiple-controller --party=ALICE  # node1, listening on 1.2.3.4:5555
> rayfed start --address="1.2.3.4:5555" --party=ALICE  # node2, connecting to the node1
> rayfed start --head --mode=multiple-controller --party=BOB  # node3, listening on 5.6.7.8:6666
> rayfed start --address="5.6.7.8:6666" --party=BOB  # node4, connecting to the node3

And then, you run the following script in 2 clusters:

# main.py
fed.init(address="1.2.3.4:5555", xxx)
# nothing need to be changed in this job script.
# in node2
> python main.py --party=ALICE
# in node3
> python main.py --party=BOB

option 2

No need to add a new toolkit, but we should tell users that add some extra arguments when starting up the Ray cluster.
For example,

A. running single-controller mode

> ray start --head --resources={"_PARTY_ALICE", 9999}  # node1, listening on 1.2.3.4:5555
> ray start --address="1.2.3.4:5555" --resources={"_PARTY_ALICE", 9999}  # node2, connecting to the node1
> ray start --address="1.2.3.4:5555" --resources={"_PARTY_BOB", 9999}  # node3, connecting to the node1
> ray start --address="1.2.3.4:5555" --resources={"_PARTY_BOB", 9999}  # node4, connecting to the node1

And then, add the extra mode info when fed.init():

# main.py
fed.init(address="1.2.3.4:5555", mode="single-controller", xxx)
# Nothing need to be changed in this job script.

A. running multiple-controller mode

> ray start --head # node1, listening on 1.2.3.4:5555
> ray start --address="1.2.3.4:5555" # node2, connecting to the node1
> ray start --head # node3, listening on 5.6.7.8:6666
> ray start --address="5.6.7.8:6666" # node4, connecting to the node3

And then, add the extra mode info when fed.init()(And we could ignore it if we provide a default value):

# main.py
fed.init(address="1.2.3.4:5555", mode="multiple-controller", xxx)
# Nothing need to be changed in this job script.
@jovany-wang jovany-wang added the enhancement New feature or request label Feb 1, 2023
@jovany-wang jovany-wang added the p1 label Feb 14, 2023
@jovany-wang jovany-wang self-assigned this Feb 14, 2023
@jovany-wang jovany-wang added p0 and removed p1 labels Feb 21, 2023
@jovany-wang jovany-wang added this to the release0.1.0 milestone Feb 21, 2023
@jovany-wang jovany-wang changed the title Support running rayfed in single controller mode. Support running rayfed job in single-controller mode. Feb 21, 2023
@jovany-wang jovany-wang changed the title Support running rayfed job in single-controller mode. [RFC] Support running rayfed job in single-controller mode. Feb 24, 2023
@jovany-wang
Copy link
Collaborator Author

@ray-project/rayfed-dev CC

@jovany-wang jovany-wang pinned this issue Feb 24, 2023
@jovany-wang jovany-wang changed the title [RFC] Support running rayfed job in single-controller mode. [RFC] Support running in single-controller mode. Feb 24, 2023
@jovany-wang
Copy link
Collaborator Author

Excepting single-controller and multi-controller mode, we also need to support simulation mode for launching thousands of ends.

@jovany-wang
Copy link
Collaborator Author

For SIMULATION mode, we might run actors in different parties in one process(one Ray worker process).
For SINGLE_CONTROLLER mode, actors in different parties should be run in different Ray nodes.
For MULTI_CONTROLLER mode, actors in different parties should be run in different Ray clusters.

@jovany-wang jovany-wang changed the title [RFC] Support running in single-controller mode. [RFC] Support running in different modes. Apr 13, 2023
@jovany-wang jovany-wang added p2 and removed p0 labels Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request p2
Projects
None yet
Development

No branches or pull requests

5 participants