You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Pulsar will take as many jobs off the setup queue as are received and immediately begin preprocessing them. If there is a large backlog of jobs (e.g. due to some kind of prior problem resulting in jobs not processing for a time period), this results in a large amount of IO contention staging in (and possibly hitting open file limits, if you don't increase them), causing jobs with even moderately small inputs to queue for hours because writing is so slow.
Unfortunately I can't really quantify the penalty - it is possible that the overall job throughput would not be any better even if a limited queue were in place, since the same amount of data still has to be transferred either way. But I do suspect it'd still move that data quicker if it weren't trying to do all of it at once.
The text was updated successfully, but these errors were encountered:
Currently, Pulsar will take as many jobs off the setup queue as are received and immediately begin preprocessing them. If there is a large backlog of jobs (e.g. due to some kind of prior problem resulting in jobs not processing for a time period), this results in a large amount of IO contention staging in (and possibly hitting open file limits, if you don't increase them), causing jobs with even moderately small inputs to queue for hours because writing is so slow.
Unfortunately I can't really quantify the penalty - it is possible that the overall job throughput would not be any better even if a limited queue were in place, since the same amount of data still has to be transferred either way. But I do suspect it'd still move that data quicker if it weren't trying to do all of it at once.
The text was updated successfully, but these errors were encountered: