-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage by master when using clustermq backend #933
Comments
Accidentally submitted without writing anything, but it's coming... |
Now it's ready. |
Thanks for sending. Memory usage has gone way up on my priority list. For your use case, I recommend Lines 101 to 106 in a082e5f
Related: in the development version, |
Above, I meant to say that we can reduce data sent over the SSH connection with worker caching (as long as the SLURM cluster has access to the cache as a mounted file system). By the way, I really like your idea of splitting locally first. I am actually using it on a real project today. |
Oops, closed the wrong issue. |
One disadvantage of splitting locally is that we lose some concurrency. Not sure what we can do about that right now. |
Unfortunately, that is not my case. My whole purpose for using the I think it would help me to use a blocking connection to transfer dependencies to workers, or at least to be able to limit the number of simultaneous transfers. I'm not sure whether that's a |
Unfortunately, it goes deeper than that. In |
How important is this? My usual recommendation is to run everything on the cluster itself, even the master process that calls |
That's what I have done in the past. I thought this would be nice in that it allows more direct interactivity with the results, rather than writing everything of interest to files and transferring them. Can I transfer the cache using e.g.
@mschubert It looks like pbdZMQ uses blocking connections by default. This is also the behavior of the rzmq-compatibility wrapper function, so blocking will come along for the ride by default if |
As much as I'd like to blame The way Of course, data should only be transferred once ( Once that's fixed we'll also have to adjust the way |
@mschubert That does seem like a more complete solution to the issue of transferring the same data many times. In the case where different targets are using different input data, however, this issue would still be causing a crash. The problem is that all of these transfers are being started concurrently. If |
Good point! Now tracked here: mschubert/clustermq#161 |
Excellent. Looks like future work on |
Prework
drake
's code of conduct.remotes::install_github("ropensci/drake")
) and mention the SHA-1 hash of the Git commit you install.Using
drake
7.4.0 andclustermq
0.8.8.Description
When scheduling tasks using the
clustermq
backend (in my particular case, toslurm
viassh
), the master process sends data to multiple workers concurrently. Depending on the data that needs to be sent, this can result in the master dying due to out-of-memory. In any case, it delays the time to the first worker having all its data and starting to work.Reproducible example
I'm not sure how to reproduce this locally, so this example would need tweaking for whatever cluster setup you are using.
R script
config/SSH.tmpl
config/slurm.Rprofile
slurm_clustermq.tmpl
Benchmarks
drake.log
The first few targets are assigned at a rate of about one every 2 seconds. After that, my computer is starting to swap, and targets are started more slowly, by the end about one per minute. Total memory usage by R was about 10GB when it crashed during processing_13.
big_data
is about 750MB. Outgoing network utilization was about 15-20MB/s while the targets were being queued. I also tried a smaller instance withbig_data
around 75MB, which completed successfully. The data transfer continued for several minutes after the last job was queued, verifying that new jobs are being queued and their data loaded into memory before the data from earlier jobs are sent.I reduced the memory usage and total transfer size by doing the split locally as a separate target:
This should probably be best practice when using
drake_split
on a distributed system, especially overssh
; otherwise the whole data is sent to every worker, and then split. In my case, this increased the number of jobs that could be scheduled before crashing, but did not get me through my whole plan.Reducing the number of workers is successful, but that defeats the purpose of high performance computing.
The text was updated successfully, but these errors were encountered: