-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snakemake SLURM Functionality #77
Snakemake SLURM Functionality #77
Conversation
… feature/pnast/MIC-4905-cluster
output:
|
results dir tree:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! Want to confirm whether the DRMAA dependency is gone on this branch, I'd like to test on HYAK.
Just tested this on HYAK -- it doesn't have SLURM_ROOT 😞 When we change it to just check for an "sbatch" command or similar, then it should at least get past that roadblock. |
Manually commenting that bit out, I see |
good catch, fixed |
rule: | ||
name: "{self.name}" | ||
name: "{self.implementation_name}" | ||
message: "Running {self.step_name} implementation: {self.implementation_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message is snakemake's way of logging to stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it adds to the snakemake logging--example in the stdout posted above
@@ -121,17 +121,24 @@ def write_implementation_rules( | |||
validation_file = str( | |||
results_dir / "input_validations" / implementation.validation_filename | |||
) | |||
resources = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we may want to specify this as "slurm_resources" (as opposed to spark resources or in the future who knows what else)
Unless your thought is the logic will go here eventually regardless of resource type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the more general "resources" make sense at least on the rule side, given that we may support non-slurm execution resources, and this would be the place that those would go (though not necessarily for something like spark,)
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#snakefiles-standard-resources
@@ -38,6 +39,9 @@ def main( | |||
## See above | |||
"--envvars", | |||
"foo", | |||
## Suppress some of the snakemake output | |||
"--quiet", | |||
"progress", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this saying to suppross some output (--quiet
) but to show progress...bars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"progress" is the argument to "--quiet", it's suppressing some logging about the progress of execution
@@ -11,6 +11,11 @@ | |||
from linker.configuration import Config | |||
|
|||
|
|||
def is_on_slurm() -> bool: | |||
"""Returns True if the current environment is a SLURM cluster.""" | |||
return "SLURM_ROOT" in os.environ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, in my unmerged branch for Jenkins a better check is return shutil.which("sbatch") is not None
tests/unit/test_pipeline.py
Outdated
assert snake_str_lines[i].strip() == expected_line.strip() | ||
|
||
|
||
def test_build_snakefile_slurm(default_config_params, mocker, test_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look super close but these two tests seem to be lots of duplicate code - can you parameterize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -26,6 +27,15 @@ | |||
IN_GITHUB_ACTIONS = os.getenv("GITHUB_ACTIONS") == "true" | |||
|
|||
|
|||
def test_is_on_slurm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, this is just merge conflict stuff - it's fine as is and I'll get the better is_on_slurm
in my pr
No e2e tests? (I know they don't work via github actions yet but they should be able to work locally) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Mostly nits and questions, though I think we need the (locally supported only) e2e tests
I was going to do that as a part of https://jira.ihme.washington.edu/secure/RapidBoard.jspa?rapidView=367&projectKey=MIC&view=detail&selectedIssue=MIC-4938# |
* Basic Snakemake functionality (#73) * wrap snakefile into linker run * basic implementation * some yaml stuff * remove snakefile stuff * start writing * write the snakefile better this time * newln * add cache and remote * remove drmaa log dir * remove common.smk * add Rule dataclass * remove unused utils * refactor where validations live * remove snakemake utils * cleanup * lint * add validation rules * reverse input and output * lint * fix some tests * add back implementation config * lint * remove temp * add other script commands * remove slurm supprt for now * remove the false script * add diagnostics back * fix test * add comment * fix typing * fix script cmd to property * revert step change * adjust some tests * revert script change again * add dir for input validations * lint * add some type hints * fix metadata errors * fix script command * add new tests * fix errors * fix test again * use pipeline file * container_path -> image_path * add missing type hints * lint * change to image path * rename ValidationRule and add comment * add comment * fix validations test * lint * rename step validation * ensure rules have right number of lines * added docstrings for rules * add todos, clean up * fix broken test * Add Snakemake Containers (#76) * initial solution * small refactors * fix pyspark and R * lint * fix existing texts * add unit tests * simplify implementation metadata * add snakemake dep * reformat dep * check 3.11 and 3.12 * create linker subdir and rename bind_dir * make paths absolute * fix the test too * Snakemake SLURM Functionality (#77) * add resources * work on logging * configure tmp dir * cleanup * fix broken tests * lint * format output logging * adjust some comments * add local / "slurm" arguments test * add local / slurm specific string tests * lint * better check for if on slurm * use steve's check on slurm * ingnore slurm check on GHA * add executor dep * add step name to rules * parameterize tests --------- Co-authored-by: Steve Bachmeier <[email protected]> * Refactor results_dir into Config (#78) * absorb results dir into config * fix linker tmp * fix numerous tests * delete unintended files * change test output dir * refactor to snake_filepath, also overwrite if exists * refactor some config items * lint * mess around with strings and Paths * change some updates to set value * move copy to results * lint * reset the old umask * wrap umask intermediate in try/finally * wrap umask intermediate in try/finally (#80) * Resolve Merge Conflicts, e2e and integration testing (#79) * Feature/sbachmei/mic 4846 local e2e tests (#74) * better check for if on slurm (#75) * move specs * change mem arg * add new tests * remove todo * add debug arg * move spec dir * fix broken tests * add docstrings * lint * add comments to todos * standardize resource names * remove quoting * fix integration test * fix tests * change to cpus per task * make keys agnostic to slurm or implementation resources --------- Co-authored-by: Steve Bachmeier <[email protected]> * Pin Executor Plugin Package (#81) * add pin * lint * Snakemake Spark (#84) * make attribute public * add requires_spark and additional snakefile declarations * add to rule defs * add snakefle path * add spark snakefile * whoops, we were in minutes * make it a module * adjust spark smk * lint * change number of workers depending on spark * use spark resources * add timeouts * add slurm logs and output file flexibility * lint * adjust existing tests * remove unused code * lint * remove more util tests * fix tests * revert metadata * lint * revert metadata again * remove duplicate escapes * change spark resources * adjust configuration and defaults * allow non-int resources * change a word * revert change from str to int * i did a bad job setting things back to the way they were * remove get jobs helper * remove import * adjust spark alloc * write resources to params * fix tests * lint * adjust params * add comment * Jenkins Builds with Snakemake (#85) * merge main * reduce shared fs usage * add debug flag * lint * actually do debug * increase latency wait * fewer shared fs * run snakemake from results directory * fix tests * try to add different source cache location * make the variable a string instead * lint * remove source cache from shared fs * try changing cache location * make source cache results dir * rearrange * make source cache in .snakemake * revert some debugging changes * add integration tests to jenkins (#87) * add integration tests to jenkins * adjust test parameters * lint * adjust specification organization * remove string wrap --------- Co-authored-by: Steve Bachmeier <[email protected]> Co-authored-by: Steve Bachmeier <[email protected]>
* Basic Snakemake functionality (#73) * wrap snakefile into linker run * basic implementation * some yaml stuff * remove snakefile stuff * start writing * write the snakefile better this time * newln * add cache and remote * remove drmaa log dir * remove common.smk * add Rule dataclass * remove unused utils * refactor where validations live * remove snakemake utils * cleanup * lint * add validation rules * reverse input and output * lint * fix some tests * add back implementation config * lint * remove temp * add other script commands * remove slurm supprt for now * remove the false script * add diagnostics back * fix test * add comment * fix typing * fix script cmd to property * revert step change * adjust some tests * revert script change again * add dir for input validations * lint * add some type hints * fix metadata errors * fix script command * add new tests * fix errors * fix test again * use pipeline file * container_path -> image_path * add missing type hints * lint * change to image path * rename ValidationRule and add comment * add comment * fix validations test * lint * rename step validation * ensure rules have right number of lines * added docstrings for rules * add todos, clean up * fix broken test * Add Snakemake Containers (#76) * initial solution * small refactors * fix pyspark and R * lint * fix existing texts * add unit tests * simplify implementation metadata * add snakemake dep * reformat dep * check 3.11 and 3.12 * create linker subdir and rename bind_dir * make paths absolute * fix the test too * Snakemake SLURM Functionality (#77) * add resources * work on logging * configure tmp dir * cleanup * fix broken tests * lint * format output logging * adjust some comments * add local / "slurm" arguments test * add local / slurm specific string tests * lint * better check for if on slurm * use steve's check on slurm * ingnore slurm check on GHA * add executor dep * add step name to rules * parameterize tests --------- Co-authored-by: Steve Bachmeier <[email protected]> * Refactor results_dir into Config (#78) * absorb results dir into config * fix linker tmp * fix numerous tests * delete unintended files * change test output dir * refactor to snake_filepath, also overwrite if exists * refactor some config items * lint * mess around with strings and Paths * change some updates to set value * move copy to results * lint * reset the old umask * wrap umask intermediate in try/finally * wrap umask intermediate in try/finally (#80) * Resolve Merge Conflicts, e2e and integration testing (#79) * Feature/sbachmei/mic 4846 local e2e tests (#74) * better check for if on slurm (#75) * move specs * change mem arg * add new tests * remove todo * add debug arg * move spec dir * fix broken tests * add docstrings * lint * add comments to todos * standardize resource names * remove quoting * fix integration test * fix tests * change to cpus per task * make keys agnostic to slurm or implementation resources --------- Co-authored-by: Steve Bachmeier <[email protected]> * Pin Executor Plugin Package (#81) * add pin * lint * add two more steps * add fake containers * Snakemake Spark (#84) * make attribute public * add requires_spark and additional snakefile declarations * add to rule defs * add snakefle path * add spark snakefile * whoops, we were in minutes * make it a module * adjust spark smk * lint * change number of workers depending on spark * use spark resources * add timeouts * add slurm logs and output file flexibility * lint * adjust existing tests * remove unused code * lint * remove more util tests * fix tests * revert metadata * lint * revert metadata again * remove duplicate escapes * change spark resources * adjust configuration and defaults * allow non-int resources * change a word * revert change from str to int * i did a bad job setting things back to the way they were * remove get jobs helper * remove import * adjust spark alloc * write resources to params * fix tests * lint * adjust params * add comment * Jenkins Builds with Snakemake (#85) * merge main * reduce shared fs usage * add debug flag * lint * actually do debug * increase latency wait * fewer shared fs * run snakemake from results directory * fix tests * try to add different source cache location * make the variable a string instead * lint * remove source cache from shared fs * try changing cache location * make source cache results dir * rearrange * make source cache in .snakemake * revert some debugging changes * adjust syntax * add back imp metadata * fix specs * remove duplicate code * fix typo * change checksum * accidentally commented out slurm e2e --------- Co-authored-by: Steve Bachmeier <[email protected]> Co-authored-by: Steve Bachmeier <[email protected]>
Snakemake SLURM Functionality
Description
implementation
Changes and notes
Added slurm back to possible execution methods
adjusted snakemake logging a bit
Verification and Testing
added unit tests; ran with pandas, pyspark, r implementations