Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not acquire file lock on Slurm cluster #84

Closed
araghukas opened this issue Jan 3, 2024 · 2 comments · Fixed by #85
Closed

Can not acquire file lock on Slurm cluster #84

araghukas opened this issue Jan 3, 2024 · 2 comments · Fixed by #85
Assignees
Labels
bug 🐛 Something isn't working

Comments

@araghukas
Copy link
Contributor

covalent is a dependency when wrapper_fn function is unpickled and executed.

However, when covalent is initialized for the first time, it will try to create a new config file, which means acquiring a filelock inside ConfigManager.update_config().

Trying to acquire the filelock leads to the following error

Traceback (most recent call last):
  File "/global/homes/a/ara/slurm-tests/script-310be8a1-383d-4586-9dc1-821c8120e93f-0.py", line 5, in <module>
    function, args, kwargs = pickle.load(f)
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/covalent/__init__.py", line 22, in <module>
    from . import executor, leptons  # nopycln: import
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/covalent/executor/__init__.py", line 32, in <module>
    from .._shared_files import logger
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/covalent/_shared_files/logger.py", line 24, in <module>
    from .config import get_config
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/covalent/_shared_files/config.py", line 199, in <module>
    _config_manager = ConfigManager()
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/covalent/_shared_files/config.py", line 52, in __init__
    self.update_config()
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/covalent/_shared_files/config.py", line 109, in update_config
    with filelock.FileLock(f"{self.config_file}.lock", timeout=1):
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/filelock/_api.py", line 297, in __enter__
    self.acquire()
  File "/global/homes/a/ara/miniconda3/envs/slurm-test/lib/python3.9/site-packages/filelock/_api.py", line 264, in acquire
    raise Timeout(lock_filename)  # noqa: TRY301
filelock._error.Timeout: The file lock '/global/homes/a/ara/.config/covalent/covalent.conf.lock' could not be acquired.

Notes

  • Solution may be a change to core covalent, not this plugin.
  • Manually disabling the filelock acquisition code seems to resolve the problem (workflow completes successfully)
@araghukas araghukas added the bug 🐛 Something isn't working label Jan 3, 2024
@araghukas araghukas self-assigned this Jan 3, 2024
@santoshkumarradha
Copy link
Member

santoshkumarradha commented Jan 3, 2024

wasn't that introduced in AgnostiqHQ/covalent#1719 ?

@araghukas
Copy link
Contributor Author

wasn't that introduced in AgnostiqHQ/covalent#1719 ?

@santoshkumarradha Yes it was. Thanks for pointing that out. Used simple fix suggested there in this PR: #85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants