Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unterminated string literal syntax error parsing slurm_extra #18

Closed
paudano opened this issue Jan 10, 2024 · 29 comments
Closed

Unterminated string literal syntax error parsing slurm_extra #18

paudano opened this issue Jan 10, 2024 · 29 comments

Comments

@paudano
Copy link

paudano commented Jan 10, 2024

When parsing a slurm_extra resource, I get SyntaxError: unterminated string literal.

In my profile, I have set-resources with - test_rule:slurm_extra=--queue='gpu_dev'. The error message says:

Failed to evaluate default resources value '--queue='gpu_dev'.

The error message has imbalanced single-quotes. The first and last are part of the error-handling code, so it looks like it's trying to eval --queue='gpu_dev. I'm not sure where the quotes are being stripped, and trying to "trick" it by adding additional quotes to the end of the slurm_extra parameter doesn't get around it.

Snakemake: 8.1.0
snakemake-executor-plugin-slurm: 0.1.4
python: 3.12.1

Command and output:

$ snakemake --executor slurm --profile profiles/slurm -j 1
Using profile profiles/slurm for setting default command line arguments.
Building DAG of jobs...
Retrieving input from storage.
Using shell: /usr/bin/bash
Provided remote nodes: 1
Job stats:
job          count
---------  -------
test_rule        1
total            1

Select jobs to execute...
InputFunctionException in rule test_rule in file /temp/Snakefile, line 3:
Error:
  WorkflowError:
    Failed to evaluate default resources value '--queue='gpu_dev'.
        String arguments may need additional quoting. Ex: --default-resources "tmpdir='/home/user/tmp'".
    SyntaxError: unterminated string literal (detected at line 1) (<string>, line 1)
Wildcards:

Traceback:
 (rule test_rule, line 3, /temp/Snakefile)

Attached Snakemake and profile:
unterminated_issue.tar.gz

@cmeesters
Copy link
Member

Please try: slurm_extra="--queue='gpu_dev'" like the help message to the error suggests. However: Note, that SLURM does not know a --queue flag.

@paudano
Copy link
Author

paudano commented Jan 10, 2024

Of course I tried that.

- test_rule:slurm_extra="--qos='gpu_dev'"

Building DAG of jobs...
Retrieving input from storage.
Using shell: /usr/bin/bash
Provided remote nodes: 1
Job stats:
job          count
---------  -------
test_rule        1
total            1

Select jobs to execute...
InputFunctionException in rule test_rule in file /home/audanp/temp/deleme/Snakefile, line 3:
Error:
  WorkflowError:
    Failed to evaluate default resources value '--qos='gpu_dev'.
        String arguments may need additional quoting. Ex: --default-resources "tmpdir='/home/user/tmp'".
    SyntaxError: unterminated string literal (detected at line 1) (<string>, line 1)
Wildcards:

Traceback:
 (rule test_rule, line 3, /home/audanp/temp/deleme/Snakefile)

This is an example I have been mucking with to try to get the slurm plugin to run what was working Snakemake 7 and --slurm:
- guppy_run:slurm_extra=-q gpu_training --gres gpu:v100:4

Yes, --queue should have been --qos, but it wasn't getting that far.

@cmeesters
Copy link
Member

Oh, can you please try: <rule name>: slurm_extra="--qos='gpu_dev'" - note the space after the colon.

@paudano
Copy link
Author

paudano commented Jan 10, 2024

The error changes, but still reporting that it's missing the closing quote:

WorkflowError:
SLURM job submission failed. The error message was /bin/sh: -c: line 1: unexpected EOF while looking for matching `''
/bin/sh: -c: line 2: syntax error: unexpected end of file

Resources config.yaml:

---
set-resources:
  - test_rule:mem_mb=1024
  - test_rule:runtime=30
  - test_rule:slurm_partition=gpus
  - test_rule: slurm_extra="--qos='gpu_dev'"

@cmeesters
Copy link
Member

cmeesters commented Jan 10, 2024

this is no valid yaml-format either, try something like:

set-resources:
    test_rule:
        mem_mb: 1024
        runtime: 30
        slurm_partition: "gpus"
        slurm_extra: "--qos='gpu_dev' <further combinations"

Additional rules are described with the same indentation level as test_rule.

@paudano
Copy link
Author

paudano commented Jan 10, 2024

With:

---
set-resources:
    test_rule:
        mem_mb: 1024
        runtime: 30
        slurm_partition: "gpus"
        slurm_extra: "--qos='gpu_dev'"

I get the original SyntaxError again:

InputFunctionException in rule test_rule in file /home/audanp/temp/deleme/Snakefile, line 3:
Error:
  WorkflowError:
    Failed to evaluate default resources value '--qos='gpu_dev'.
        String arguments may need additional quoting. Ex: --default-resources "tmpdir='/home/user/tmp'".
    SyntaxError: unterminated string literal (detected at line 1) (<string>, line 1)
Wildcards:

Traceback:
 (rule test_rule, line 3, /home/audanp/temp/deleme/Snakefile)

@cmeesters
Copy link
Member

Well, I created a minimal example to chase this. Seems to be a bug transporting the information to the SLURM job context.

Minimal example:

rule all:
     input: "results/b.out"

rule test1:
     output: "results/a.out"
     shell: "touch {output}"

rule test2:
     input: rules.test1.output
     output: "results/b.out"
     shell: "cp {input} {output}"

Profile:

$ cat profiles/config.yaml 
set-resources:
    test1:
        slurm_partition: "smp"
    test2:
        slurm_partition: "smp"

Call: $ snakemake --executor slurm -j 2 --workflow-profile ./profiles/ --default-resources slurm_account=m2_zdvhpc

Tries to hand to the jobstep a correct call, yet also the following (obtained by printf-debugging):

--set-resources 'test1:slurm_partition=<function eval_resource_expression.<locals>.callable at 0x7f7807d11940>' 'test2:slurm_partition=<function eval_resource_expression.<locals>.callable at 0x7f7807d11a80>'

Which cannot be evaluated.

I did not notice this at first, as default resources get transported fine (as far as I can see) and the workflow I was trying had sufficient defaults. Resources, however, fail to be set correctly in the jobstep context, due to the erroneous submission command line.

This is hence a bug in snakemake-core. We are on it.

@paudano
Copy link
Author

paudano commented Jan 12, 2024

Thank you!

@brand-fabian
Copy link

Hi Christian and Peter,

recently I also encountered this issue when attempting to migrate my workflow to snakemake v8 and I do not think that this issue is only in snakemake-core, however I think it is a complicated interaction between the core and this module. I tried to pass

default-resources:
    slurm_extra: "--gres=localtmp:10G"

to my workflow. Please note that the syntax with = is not necessary for the slurm command line, but is required for snakemake to parse the line correctly, but that is a bug/feature for another time.

From what I can see right now there are two potential sources of errors that arise when the resources get parsed:

  1. In snakemake-core/resources.py every resource statement is passed into the python eval function, so it must be valid python code. This is obviously true for most resource arguments (Integers and Generic strings), but in our case it is an assignment to a variable starting with --, which is invalid python. To avoid this bug therefore the whole string needs to be enclosed in quotes (double or single), which leads to the second issue...
  2. After the snakemake-core parsed the resources, this module gets the specifications and constructs a sbatch command line from them. In the first step there is nothing wrong, however all default resources are also passed to the sbatch command line inside the --wrap option which encloses its argument inside double quotes " (here). Because of this and because the slurm job will invoke a shell which adds another layer of parsing / escaping to the script, all double quotes get wiped from the string which re-triggers the parse error from step one, but now inside the job.

Iam currently trying two fixes for these problems and can update you or provide a PR if desired and if my workflow executes properly:

  1. I use the resource declaration slurm_extra: str("--gres=localtmp:10G"), which gets me around all the parser incompatibilities and since the original string is parsed around in subsequent command lines I figured it will usually stay the same.
  2. Amend L133 with
call += f' --wrap="{re.sub(r"([\"])", r"\\\1", exec_job)}"'

@Ulthran
Copy link
Contributor

Ulthran commented Feb 1, 2024

I think I'm running into the same problem trying to name jobs by their rules and output slurm logs to named files:

default-resources:
  - slurm_account="hpcusers"
  - slurm_partition="defq"
  - mem_mb=8000
  - runtime=15
  - disk_mb=1000
  - slurm_extra="--job-name=%x --output=slurm_%x_%j"

Different iterations on this either fail with the evaluate default resources value error or do replace the %x and %j but %x becomes the slurm assigned job name (some random alphanumeric string). How do I get the snakemake job names to come through here? This was working with snakemake 7.

@w8jcik
Copy link

w8jcik commented Feb 1, 2024

The only workaround I could find is to move slurm_extra from your profile to resources of each job in Snakefile. Some other Slurm-specific switches might be also broken in the profile, then move them too.

@Ulthran
Copy link
Contributor

Ulthran commented Feb 1, 2024

Hmm that is not ideal. Do you know if there's an open issue/PR for this on snakemake core?

@w8jcik
Copy link

w8jcik commented Feb 1, 2024

There is a merge request in this thread adding a test case. I am not familiar with organisation of this project, but if it is test driven development, then maybe fix goes later in the same merge request.

@brand-fabian
Copy link

@Ulthran I think changing the job name generated by this plugin can not work at all due to

f"sbatch --job-name {self.run_uuid} --output {slurm_logfile} --export=ALL "

which specifies the job name explicitly and sets it to a random uuid4 that is generated once on the first initialization of the workflow / executor object. I also think that this is not a particularly good idea, and dont really know why it is needed, perhaps one of the original authors can shed some light on that.

I would also much prefer to be able to set the slurm_extra params in a workflow-profile instead of the rules directly, especially since this parameter encompasses many needed slurm flags (qos, gres, etc.).

johanneskoester added a commit to snakemake/snakemake that referenced this issue Feb 2, 2024
### Description

The PR is related to the[ issue described for the slurm submit
executor](snakemake/snakemake-executor-plugin-slurm#18)
plugin to not correctly hand over arguments.

The test folder is named `test_slurm_resource_propagation meesters` and
may not be complete, as it covers multiple related scenarios to provoke
the error.

Note, that the tests were carried out when using a cluster:

- `$ snakemake --executor slurm -j2 --workflow-profile ./profiles/
--default-resources slurm_account=<account>`
  yields
  ``Error:
  WorkflowError:
Failed to evaluate default resources value '<function
eval_resource_expression.<locals>.callable at 0x7f5c19ed0e00>'.```
    
when the patition is defined at the rule level in a profile. (see the
directory of the test case)
    
- likewise `$ snakemake --executor slurm -j2 --workflow-profile
./profiles/ --default-resources slurm_account=<account> --set-resources
slurm_extra="--nice=150"`
  yields
```ValueError: Invalid resource definition: entries have to be defined
as RULE:RESOURCE=VALUE, with VALUE being a positive integer a quoted
string, or a Python expression (e.g. min(max(2*input.size_mb, 1000),
8000)).```
  
These errors can both be triggered on the command line and in the config
file, of course.


### QC

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

---------

Co-authored-by: Johannes Koester <[email protected]>
Co-authored-by: Johannes Köster <[email protected]>
@cmeesters
Copy link
Member

This is fixed in v8.4.3 - as mentioned in the linked PR here above. However, another fix is forthcoming. Stay tuned.

@cmeesters
Copy link
Member

v8.4.4 is out. Alas, I cannot test it right now, for the lack of a cluster (maintenance). You might want to give it a try.

@paudano
Copy link
Author

paudano commented Feb 6, 2024

I see the same error as before with Snakemake 8.4.4 and slurm executor plugin 0.3.0.

Versions:

snakemake                 8.4.4                hdfd78af_0    bioconda
snakemake-executor-plugin-slurm 0.3.0              pyhdfd78af_0    bioconda

Error:

InputFunctionException in rule test_rule in file /pod/2/beck-lab/audanp/projects/2024_Snake_Slurm_Extra/Snakefile, line 3:
Error:
  WorkflowError:
    Failed to evaluate resources value '--qos="gpu_dev"'.
        String arguments may need additional quoting. E.g.: --default-resources "tmpdir='/home/user/tmp'".
    SyntaxError: invalid syntax (<string>, line 1)

@Ulthran
Copy link
Contributor

Ulthran commented Feb 6, 2024

I could be missing something but I'm still not able to access the rule name to pass through to slurm_extra here. I've made a workaround for it that's a little hacky but seems to be working with this update, added this to the bottom of my Snakefile:

for rule in rules._rules:
    slurm_extra = f"--job-name={rule} --output=slurm_{rule}_%j"
    getattr(rules, rule).rule.resources["slurm_extra"] = slurm_extra

EDIT: Don't set the job-name like this, the executor relies on the name it applies to determine when jobs have finished. Here is an updated version that gives a new output name and new comment:

for rule_name in rules._rules:
    rule_obj = getattr(rules, rule_name).rule
    wcs = rule_obj._wildcard_names
    if wcs:
        rule_obj.resources["slurm_extra"] = (
            lambda wc, rule_name=rule_name: f"--output=slurm_{rule_name}_{'_'.join(wc)}_%j --comment={rule_name}_{'_'.join(wc)}"
        )
    else:
        rule_obj.resources["slurm_extra"] = f"--output=slurm_{rule_name}_%j --comment={rule_name}"

Once #35 is included in a release, this shouldn't be necessary anymore.

@w8jcik
Copy link

w8jcik commented Feb 7, 2024

v8.4.4 is out. Alas, I cannot test it right now, for the lack of a cluster (maintenance). You might want to give it a try.

Previously it was failing before submission to Slurm. Now an error appears when the job gets to a node.

@cmeesters
Copy link
Member

Ah, it took me to understand this Failed to evaluate resources value '--qos="gpu_dev"' for I did not read carefully and was a bit occupied yesterday (in fact, I still am :-( ). So, the thing is, that the parsing now works, but of course "--qos" is no argument to snakemake. --set-resources "slurm_extra='...'" would be a valid argument. With other words, the snakemake core strips away the slurm_extra and we are back on square 1.

@blaiseli
Copy link

blaiseli commented Feb 16, 2024

@Ulthran A not very satisfactory workaround for the hard-coded job name might be to use job comments instead:

Snakemake-Profiles/slurm#117 (comment)

Edit: I hadn't seen your later post with a better hack. Thanks for sharing.

@cmeesters
Copy link
Member

I think I have an idea. PR is in preparation.

@Ulthran
Copy link
Contributor

Ulthran commented Feb 16, 2024

[> Ulthran](#18 (comment))

@blaiseli I just edited that comment. Note that setting the job-name will break the executor's ability to tell when jobs finish.

@cmeesters
Copy link
Member

@ALL please update Snakemake and this executor plugin to the newest releases and try again.

@paudano
Copy link
Author

paudano commented Feb 29, 2024

I am still getting parsing errors trying to set --qos through slurm_extra. Was this update just related to the job name issues? I also couldn't get it to replace the job name (jobs are submitted as "%x" to Slurm).

As a side note, what's the best way to install the plugin from the git repo (latest version on bioconda is 0.3.0)? I created a minimal setup.py, and that seems to work.

Here's the snakemake packages I have in my test conda environment (installed dependencies, then snakemake-executor-plugin-slurm from the github repos with setup.py):

snakemake 8.5.3 pypi_0 pypi
snakemake-executor-plugin-slurm-jobstep 0.1.10 pyhdfd78af_0 bioconda
snakemake-interface-common 1.17.1 pypi_0 pypi
snakemake-interface-executor-plugins 8.2.0 pypi_0 pypi
snakemake-interface-report-plugins 1.0.0 pyhdfd78af_0 bioconda
snakemake-interface-storage-plugins 3.1.0 pypi_0 pypi
snakemake-minimal 8.5.3 pyhdfd78af_0 bioconda

Package `snakemake-exeucutor-plugin-slurm' doesn't show up here, but snakemake only finds the plugin after I install from the repo (commit 3b6a359).

Let me know if I'm testing something incorrectly. Thanks!

@cmeesters
Copy link
Member

snakemake-executor-plugin-slurm is available via bioconda. That reminds me: The minimal software stack in my docs-PR is missing ...

Anyway, for me, slurm_extra works like a charm. The main parsing issue was in the Snakemake core. Several attempts to fix it, lead to yet another side effect. Yet, since we had two rather obscure issues, I asked to update the executor, too.

Now, whether you install the software stack with conda/mamba/micromamba or poetry or write your own setup scripts - as long as everything is part of one environment, it should be fine. You can check whether all packages are under one site-packages folder, yourself.

My workflow profile for testing:

default-resources:
    slurm_partition: "smp"
    tasks: 1

set-resources:
    test1:
        slurm_partition: "m2_gpu"
        runtime: 5
        slurm_extra: "'--nice=150 --gres=gpu:1'"

Notwithstanding different partition, accounts and the little fact that you are using a --qos-flag, that should work as intended.

Please run your workflow with --verbose, paste the command line, any profile file and attach the log to your reply. I'm sure we can hunt the issue down, eventually.

@w8jcik
Copy link

w8jcik commented Mar 1, 2024

Submission still fails.

I am testing following profile

executor: slurm

default-resources:
  slurm_partition: some-partition
  tasks: 1
  threads: 10
  slurm_extra: "'--gres=gpu:1'"

With following Snakemake packages

Slurm 23.11.4

Submission to Slurm fails after it reaches a node

snakemake force_field
Error in rule force_field:
    message: SLURM-job '9731003' failed, SLURM status is: 'FAILED'For further error details see the cluster/cloud log and the log files of the involved rule(s).
InputFunctionException in rule force_field in file [obfuscated]/Snakefile, line 6:
Error:
  WorkflowError:
    Failed to evaluate resources value '--gres=gpu:1'.
        String arguments may need additional quoting. E.g.: --default-resources "tmpdir='/home/user/tmp'" or --set-resources "somerule:someresource='--nice=100'". This also holds for setting resources inside of a profile, where you might have to enclose them in single and double quotes, i.e. someresource: "'--nice=100'".
    SyntaxError: invalid syntax (<string>, line 1)

Running with --verbose

sbatch call: sbatch --job-name e4d0cd06-d884-4f24-bba2-f8b2141d036b --output [obfuscated]/%j.log --export=ALL --comment force_field -A default -p some-partition --mem 1000 --ntasks=1 --cpus-per-task=1 --gres=gpu:1 -D [obfuscated] --wrap="[obfuscated]/python3.11 -m snakemake --snakefile '/[obfuscated]/Snakefile' --target-jobs 'force_field:' --allowed-rules 'force_field' --cores 'all' --attempt 1 --force-use-threads  --resources 'mem_mb=1000' 'mem_mib=954' 'disk_mb=1000' 'disk_mib=954' 'tasks=1' 'threads=10' --wait-for-files '[obfuscated]/.snakemake/tmp.5pr3uo7v' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --keep-incomplete  --verbose  --rerun-triggers mtime --conda-frontend 'mamba' --shared-fs-usage persistence input-output source-cache sources storage-local-copies software-deployment --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 10 --scheduler 'greedy' --scheduler-solver-path '[obfuscated]' --default-resources 'mem_mb=min(max(2*input.size_mb, 1000), 8000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'slurm_partition=[obfuscated]' 'tasks=1' 'threads=10' "slurm_extra='--gres=gpu:1'" --executor slurm-jobstep --jobs 1 --mode 'remote'"

he job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-02-28T17:00 --endtime now --name e4d0cd06-d884-4f24-bba2-f8b2141d036b
It took: 0.061034202575683594 seconds
The output is:
'9731006|PENDING'

status_of_jobs after sacct is: {'9731006': 'PENDING'}
active_jobs_ids_with_current_sacct_status are: {'9731006'}
active_jobs_seen_by_sacct are: {'9731006'}
missing_sacct_status are: set()
The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-02-28T17:00 --endtime now --name e4d0cd06-d884-4f24-bba2-f8b2141d036b
It took: 0.06168413162231445 seconds
The output is:
'9731006|PENDING'

status_of_jobs after sacct is: {'9731006': 'PENDING'}
active_jobs_ids_with_current_sacct_status are: {'9731006'}
active_jobs_seen_by_sacct are: {'9731006'}
missing_sacct_status are: set()
The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-02-28T17:00 --endtime now --name e4d0cd06-d884-4f24-bba2-f8b2141d036b
It took: 0.05933403968811035 seconds
The output is:
'9731006|FAILED'

status_of_jobs after sacct is: {'9731006': 'FAILED'}
active_jobs_ids_with_current_sacct_status are: {'9731006'}
active_jobs_seen_by_sacct are: {'9731006'}
missing_sacct_status are: set()
Full Traceback (most recent call last):
  File "[obfuscated]/.spack-env/view/lib/python3.11/site-packages/snakemake/resources.py", line 533, in generic_callable
    value = eval(
            ^^^^^
  File "<string>", line 1
    --gres=gpu:1
          ^
SyntaxError: invalid syntax

Identical profile is fine with Snakemake 7.

@paudano
Copy link
Author

paudano commented Apr 23, 2024

I was able to get this working with the latest Snakemake and slurm plugin:

slurm_extra: "'--qos=gpu_inference' '--gres=gpu:v100:4'"

Seems to schedule correctly. If I single-quote around the whole string, Slurm fails.

Again, thank you. It looks like I'll be able to push my GPU workflows now.

@cmeesters
Copy link
Member

Hm, weird - I tried with my cluster and it works fine. Probably worth investigation further.

Anyway, I am curious to learn about your workflow and hope you push it to the snakemake workflow catalogue. I just taught a course on a cluster with different logins for gpu-based work and cpu-based work. This is totally unusual and would break any combined workflow - hence an existing curated workflow, may probably convince those colleagues of mine to reconsider their setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants