-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPICluster #9
Comments
Recording some more thoughts on this: I think that the import sys
import dask
from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SCHEDULER_RANK = 0
CLIENT_RANK = 1
class MPICluster(object):
def __init__(self, ...):
if RANK == SCHEDULER_RANK:
scheduler = create_scheduler(...)
addr = scheduler.address
else:
addr = None
self.scheduler_address = COMM.bcast(addr)
dask.config.set(scheduler_address=self.scheduler_address)
COMM.Barrier()
if rank == SCHEDULER_RANK:
run_scheduler(scheduler)
sys.exit()
elif rank == CLIENT_RANK:
pass
else:
create_and_run_worker(...)
sys.exit()
def close(self):
send_close_signal() but there is a problem with this: the In summary, I cannot think of a way of encapsulating the cluster start/stop procedure on the Scheduler and Worker ranks in an object-oriented way. The closest way I can think of is, frankly, a "cheat". That way is to have a from dask_mpi.core import initialize
initialize(...)
class MPICluster(object):
def __init__(self):
self.scheduler_address = dask.config.get('scheduler_address') Hence, the Maybe someone out there has a better idea... |
Is there a particular motivation behind having an |
The only reason I wanted to consider this option is that I like the pattern: from dask_something import SomethingCluster
cluster = SomethingCluster()
from distributed import Client
client = Client(cluster) ...with the Client initializing from the Cluster object. I realize that the I'll leave this issue open, just in case some development makes this possible, but I agree that an |
As I am probably still not pythonic enough, what is the problem with the solution you propose above @kmpaul ? using sys.exit() in init() would not work? |
Hey, @guillaumeeb! Yes. Using The If you removed the |
But do you really want to do this in a MPI job? Couldn't this be a documented limitation? Once your cluster is closed, this is the end of the MPI run? But maybe you wanted to mix Dask and other MPI workload as in http://blog.dask.org/2019/01/31/dask-mpi-experiment? |
Short term I like the current Long term I think it would be interesting to dynamically launch and destroy MPI jobs from within Python, perhaps by calling something like |
@guillaumeeb Yes. I agree with you that for an MPI run, you probably only want to shut down the cluster at the end of your job in most (all?) cases. However, I think that if you do this, then the only part of the @mrocklin I think that what you are describing could be accomplished by using Dask-jobqueue from within a Dask-MPI job. Although that would only work on a system with a job scheduler. So, that does suggest some other configurable options, though. What if you only want to use a fraction of the available MPI ranks? And then use more ranks later? On a system with a job scheduler, this could be bad because the scheduled MPI job would have fixed/reserved resources for the duration of the job, and only a fraction would be used (unless you were doing something like the dask-mpi-experiment). However, on a system without a job scheduler, you could elastically scale your resources with additional |
OK, Thank you both for your answers! |
You are welcome! |
After the first implementation of the
initialize
function, making it possible to launch a Dask cluster from within the client script by usingmpi4py
, it seems like the natural next step is to implement anMPICluster
object so that the canonical client script operational model of:will work the same way that calling
initialize
works.The text was updated successfully, but these errors were encountered: