Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing #11

Open
berland opened this issue Jun 11, 2019 · 5 comments · May be fixed by #206
Open

Multiprocessing #11

berland opened this issue Jun 11, 2019 · 5 comments · May be fixed by #206
Milestone

Comments

@berland
Copy link
Collaborator

berland commented Jun 11, 2019

Operations over an ensembles are trivially parallelizable.

We should utilize Python multiprocessing for this.

multiprocessing is what should be used, as multithreading will suffer from GIL.

This is probably trivial for ensemble.get_smry(), but not so trivial for ensemble.from_smry(), as we need to populate each realization object with smry data in the parent process' memory space.

Maybe ensemble.from_smry() should call realization.get_smry() with multiprocessing, and then the ensemble object (holding the master process) populates each realizations self.data['unsmry-<something>'].

We must ensure CTRL-C works, which is trickier with Multiprocessing.

See this: https://stackoverflow.com/questions/11312525/catch-ctrlc-sigint-and-exit-multiprocesses-gracefully-in-python

When this is in place, we should also be able to skip issues when libecl is core-dumping due to a difficult UNSMRY-file.

Right now, your Python session will die if libecl crashes on rough data.

@berland
Copy link
Collaborator Author

berland commented Nov 4, 2019

concurrent.futures should be used for this. Needs a backport for Python 2.7.

@wouterjdb
Copy link
Collaborator

Would it be an idea to not support Python 2.7 (just leave the old code in place when running Python 2.7) and only build this for Python3?

@berland
Copy link
Collaborator Author

berland commented Dec 12, 2019

#77 has a good start for concurrent initialization of objects. It also uncovers that the usage pattern of initializing Realization objects and then asking them do update themselves is not well suited for concurrent runs, as pickling and depickling realization objects back and forth for every operation do not scale.

A suggestion could be to allow for more processing in a realization to happen at time of object initialization. It might be possible to pass a dict with names of realization function call as keys, and with (list of) function arguments as dict values, which can be passed to __init__, and that would enable calling each necessary load_* function concurrently. __init__ in a realization would use a "batch_processor" in the realization object that can also serve as a general wrapper for later concurrent operations, and this function should return the realization object when finished, to be compatible with concurrent.future.

@berland
Copy link
Collaborator Author

berland commented Dec 20, 2019

Batch processor in #78

@berland
Copy link
Collaborator Author

berland commented May 19, 2020

#106 is ready as an implementation of this issue. Speedup is still disappointingly low, and is effectively holding back merging into master.

@berland berland linked a pull request May 19, 2020 that will close this issue
@berland berland added this to the 2.0 milestone Oct 29, 2020
@berland berland linked a pull request Mar 18, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants