Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cmd output type #4493

Merged
merged 23 commits into from
Feb 11, 2024
Merged

Add cmd output type #4493

merged 23 commits into from
Feb 11, 2024

Conversation

pditommaso
Copy link
Member

@pditommaso pditommaso commented Nov 8, 2023

Implements the support for cmd output type to collect task tool version and metadata as described here #4386 (comment)

POC usage:

process foo {
  output:
  cmd 'git --version', emit: gitVersion
  
  '''
  echo Ciao
  '''
}

workflow {
  foo()
  foo.out.gitVersion | view
}

Signed-off-by: Paolo Di Tommaso <[email protected]>
Copy link

netlify bot commented Nov 8, 2023

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit c4c6182
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/65c7dbc1e12437000807d870

@pditommaso pditommaso marked this pull request as draft November 8, 2023 03:37
@bentsherman
Copy link
Member

I kinda like the idea of overloading stdout for this purpose, since it can only be the stdout of the command. Otherwise we should use the full word command

@pditommaso
Copy link
Member Author

I wanted to keep simple, at least as POC

@pditommaso
Copy link
Member Author

pditommaso commented Nov 8, 2023

Next steps:

  • Handle command failure (how?)
    • Ben: command fail should trigger process fail, unless marked as optional (still print warning)
    • if implementation too complex, worth it to just document a || true syntax
  • Handle multiline command output (likely currently breaks)
  • Only works for Bash based tasks
    • will be a constraint of the implementation
  • Support for tuple composition
  • Add + fix tests

@bentsherman
Copy link
Member

Is this ready to test with rnaseq? Even if it only supports single-line bash commands, should be enough.

@pditommaso
Copy link
Member Author

it should

@bentsherman bentsherman self-requested a review November 15, 2023 16:47
@bentsherman
Copy link
Member

It doesn't seem to work with tuples, gonna need that for something like:

tuple val('gunzip'), cmd("gunzip --version")

@bentsherman
Copy link
Member

I will add it

@marcodelapierre
Copy link
Member

I kinda like the idea of overloading stdout for this purpose, since it can only be the stdout of the command. Otherwise we should use the full word command

@pditommaso sounds tempting to me, too, on this same ground by Ben.

@marcodelapierre
Copy link
Member

I think this comment is interesting: #4386 (comment)
Also, it relates to your point above Paolo: Only works for Bash based tasks

If multi-line is allowed, then we might leverage the shebang to allow non-bash executions, including other shells, Groovy, popular scripting langs such as Python .. a bit like the current scoping for the script block , really.

Thoughts on this point?

@marcodelapierre
Copy link
Member

Handle command failure (how?)

I would say, if the command fail, the task should fail. I.e. consistent with the behaviour as for the main script block.

@pditommaso
Copy link
Member Author

If multi-line is allowed, then we might leverage the shebang to allow non-bash executions

not sure it's possible

@bentsherman
Copy link
Member

I would say, if the command fail, the task should fail. I.e. consistent with the behaviour as for the main script block.

On this point, I think I suggested that the user should be able to ignore the failure by specifying optional: true, although it would be helpful to still log a warning in that case

If multi-line is allowed, then we might leverage the shebang to allow non-bash executions

Due to the compact nature of process inputs and outputs, it would be unwieldy to specify a multi-line script here. Instead, it should already be possible to reference a script from the bin directory in the cmd output. That would be useful anyway because, in rnaseq for example, many processes use the same tools and so the same version commands are duplicated.

Regarding the shebang, we could add a shell option to the command output and still support one-liners:

output:
cmd("print('Hello!')", shell: 'python')

@bentsherman
Copy link
Member

I kinda like the idea of overloading stdout for this purpose, since it can only be the stdout of the command. Otherwise we should use the full word command

@pditommaso sounds tempting to me, too, on this same ground by Ben.

I'm starting to lean back towards a separate command or cmd type... , this command output feels different enough from stdout that it should have a different name, and it wouldn't make sense to extend stdin in the same way

@marcodelapierre
Copy link
Member

I kinda like the idea of overloading stdout for this purpose, since it can only be the stdout of the command. Otherwise we should use the full word command

@pditommaso sounds tempting to me, too, on this same ground by Ben.

I'm starting to lean back towards a separate command or cmd type... , this command output feels different enough from stdout that it should have a different name, and it wouldn't make sense to extend stdin in the same way

I agree on the leaning back, based on the optional and script attributes you mentioned above, #4493 (comment), both of which make sense to me.

@bentsherman
Copy link
Member

Handle multiline command output (likely currently breaks)

So the multi-line output does not fail, but the newlines are converted to spaces:

$ echo nxf_out_cmd_1=$(bash --version) > .command.env
$ cat .command.env
nxf_out_cmd_1=GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu) Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

The command itself is easily fixed with some quotes:

$ echo nxf_out_cmd_1="$(bash --version)" > .command.env
$ cat .command.env 
nxf_out_cmd_1=GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

But that would break the .command.env file which expects a key-value pair on each line.

The same limitation already exists for env outputs, e.g. if I do FOO=$(bash --version) in the process script then the process output will be squeezed onto a single line. For this reason, and the fact that cmd outputs are intended for small things like metadata, while larger outputs can always be written to a file, I am fine with leaving the current behavior as it is.

@pditommaso
Copy link
Member Author

A single line is a too weak assumption. If you notice, nf-core uses a multi-lines YAML snippet!

https://github.com/nf-core/rnaseq/blob/a10f41afa204538d5dcc89a5910c299d68f94f41/modules/nf-core/salmon/index/main.nf#L42-L45

@bentsherman
Copy link
Member

That is only because they are eagerly formatting it to YAML. Instead in my proof-of-concept PR I use tuples and separate cmd outputs:

https://github.com/nf-core/rnaseq/pull/1115/files#diff-d81c6cc6d9c866f8162450ce077b2413b28ecd47881a70a5c24a1ec1f1b70205

@marcodelapierre
Copy link
Member

@pditommaso any blocker for this to be merged? Conversation around the review seems pretty much settled

@pditommaso pditommaso modified the milestones: 24.01.0-edge, 24.02.0-edge Feb 11, 2024
@pditommaso
Copy link
Member Author

Ok, I've cleaned up and refactored a but this. The main change is output: cmd become output: eval. The main reason is that command is too generic term and it was overlapping with the task command. I think eval express better the overall idea of a small computation that should be evaluated to capture the output value.

@pditommaso pditommaso merged commit df97811 into master Feb 11, 2024
22 checks passed
@pditommaso pditommaso deleted the output-cmd-type branch February 11, 2024 09:29
@ewels
Copy link
Member

ewels commented Feb 11, 2024

Nice! I like eval, good call 👍🏻

nschan pushed a commit to nschan/nextflow that referenced this pull request Feb 12, 2024
This PR introduces the output `eval` type that allows the definition of
a script expression that needs to be computed in the script context to evaluate
the output value to be emitted. An example would be:

```
process someTask {
  output:
  eval 'bash --version'
  '''
  some-command --here
  '''
}
```

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Niklas Schandry <[email protected]>
nschan pushed a commit to nschan/nextflow that referenced this pull request Feb 12, 2024
This PR introduces the output `eval` type that allows the definition of
a script expression that needs to be computed in the script context to evaluate
the output value to be emitted. An example would be:

```
process someTask {
  output:
  eval 'bash --version'
  '''
  some-command --here
  '''
}
```

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Niklas Schandry <[email protected]>
nschan pushed a commit to nschan/nextflow that referenced this pull request Apr 3, 2024
This PR introduces the output `eval` type that allows the definition of
a script expression that needs to be computed in the script context to evaluate
the output value to be emitted. An example would be:

```
process someTask {
  output:
  eval 'bash --version'
  '''
  some-command --here
  '''
}
```

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Niklas Schandry <[email protected]>
nschan pushed a commit to nschan/nextflow that referenced this pull request Apr 3, 2024
This PR introduces the output `eval` type that allows the definition of
a script expression that needs to be computed in the script context to evaluate
the output value to be emitted. An example would be:

```
process someTask {
  output:
  eval 'bash --version'
  '''
  some-command --here
  '''
}
```

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
nschan pushed a commit to nschan/nextflow that referenced this pull request Apr 3, 2024
This PR introduces the output `eval` type that allows the definition of
a script expression that needs to be computed in the script context to evaluate
the output value to be emitted. An example would be:

```
process someTask {
  output:
  eval 'bash --version'
  '''
  some-command --here
  '''
}
```

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Niklas Schandry <[email protected]>
nschan pushed a commit to nschan/nextflow that referenced this pull request Apr 3, 2024
This PR introduces the output `eval` type that allows the definition of
a script expression that needs to be computed in the script context to evaluate
the output value to be emitted. An example would be:

```
process someTask {
  output:
  eval 'bash --version'
  '''
  some-command --here
  '''
}
```

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Niklas Schandry <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants