You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation for multiple: true for the upcomming release candidate is quite confusing:
(notice the multiple_sep: both ;, : and , are being used)
Additionally, the documentation on what the expected behavior and/or intended use of multiple: true in combination with direction: output is, could use some clarification.
a) I've noticed that there is some confusion in expected logic when comparing multiple: false to multiple: true (wrt direction: output). For multiple: false, viash expects the code for the component to create a file with that exact name (e.g. when using --output_fastq a.fastq, a file a.fastq should be created). For multiple: true, there are two options: 1. the component creates a number of files, and the value provided to the argument only acts as mask/sieve to capture the desired output files. Here, we expect the user to know about the format of the output files that has been created. If the user uses --output_fastq *.fa, but the component returns *.fastq files, its a problem. This also is more complicated when the generated files depend on other input (e.g. the component generates *.fastq.gz files when using an argument --compression gzip but *.fastq when not enabling compression). So the user either needs to look at the internals of the component to know which glob to specify (or this needs to be really well documented per component). 2. Continuing with the second option; a component can take the value provided by the user (e.g *.fastq or *.fq) and create output files that matches with the provided value. This has the benefit that this corresponds with the behavior of multiple: false. The downside is that the implementer of the component is responsible for filling in the wildcard and creating files with the correct names. This is also not different compared to using multiple: false (where the correct output file must be created), but it requires extra logic from the developer and it is not possible for viash to check if the correct number of files have been created. However, the question becomes: if we want to convert the glob value provided on the command line to filenames within the script code, what can we expect from the glob pattern that is provided to the script (for example; what wildcard values are taken into account?) This ties into the next question, (see b) below.
When I look at the two options above, I am in favour of recommending option number 2, because it most closely resembles the behaviour that is expected with multiple: false: the names of the output files reflect the provided value for the argument. However, if I am not mistaken there is not real way to validate that this is being done because this is script logic (i.e. it is up to the developer to do this). We could, however, choose an option, document a recommendation and apply the recommendation in for example biobox.
Perhaps a bit of background why I am bringing this up: I have seen some confusion that it is possible to use something like --fastq_ouput a.fastq;b.fastq but an error message that a wildcard character must be used is presented. Of course, this format cannot be used because it is not generally expandable when the number of output files is variable. The confusion probably originates from the familiarity with arguments with direction: input. I think the error message might benefit from a more elaborate explanation, because it triggers the follow-up questions: why is the wildcard needed and how do I choose a correct value for it. Answering the latter question is not as easy because of reasons outlined above. One additional option that sprung to mind (just leaving it here as a mental note) is to introduce a different type for the argument (something like type: glob), just to indicate to the user that the behavior of direction: output with multiple: true really is different from all the other variants of type: file.
b). Currently, the only wildcard character that is being checked in the code is * (see BashWrapper and Nextflow ). If this is intended, I think we should make this explicit in the documentation by rewording for example a wildcard character to the wildcard character '*'. This way, the code for the component can also work based on the assumption that only the * character should be interpreted as a wildcard (and not ? and [] or other bash globs).
The text was updated successfully, but these errors were encountered:
The documentation for
multiple: true
for the upcomming release candidate is quite confusing:(notice the
multiple_sep
: both;
,:
and,
are being used)Additionally, the documentation on what the expected behavior and/or intended use of
multiple: true
in combination withdirection: output
is, could use some clarification.a) I've noticed that there is some confusion in expected logic when comparing
multiple: false
tomultiple: true
(wrtdirection: output
). Formultiple: false
, viash expects the code for the component to create a file with that exact name (e.g. when using--output_fastq a.fastq
, a filea.fastq
should be created). Formultiple: true
, there are two options: 1. the component creates a number of files, and the value provided to the argument only acts as mask/sieve to capture the desired output files. Here, we expect the user to know about the format of the output files that has been created. If the user uses--output_fastq *.fa
, but the component returns*.fastq
files, its a problem. This also is more complicated when the generated files depend on other input (e.g. the component generates*.fastq.gz
files when using an argument--compression gzip
but*.fastq
when not enabling compression). So the user either needs to look at the internals of the component to know which glob to specify (or this needs to be really well documented per component). 2. Continuing with the second option; a component can take the value provided by the user (e.g*.fastq
or*.fq
) and create output files that matches with the provided value. This has the benefit that this corresponds with the behavior ofmultiple: false
. The downside is that the implementer of the component is responsible for filling in the wildcard and creating files with the correct names. This is also not different compared to usingmultiple: false
(where the correct output file must be created), but it requires extra logic from the developer and it is not possible for viash to check if the correct number of files have been created. However, the question becomes: if we want to convert the glob value provided on the command line to filenames within the script code, what can we expect from the glob pattern that is provided to the script (for example; what wildcard values are taken into account?) This ties into the next question, (see b) below.When I look at the two options above, I am in favour of recommending option number 2, because it most closely resembles the behaviour that is expected with
multiple: false
: the names of the output files reflect the provided value for the argument. However, if I am not mistaken there is not real way to validate that this is being done because this is script logic (i.e. it is up to the developer to do this). We could, however, choose an option, document a recommendation and apply the recommendation in for example biobox.Perhaps a bit of background why I am bringing this up: I have seen some confusion that it is possible to use something like
--fastq_ouput a.fastq;b.fastq
but an error message that a wildcard character must be used is presented. Of course, this format cannot be used because it is not generally expandable when the number of output files is variable. The confusion probably originates from the familiarity with arguments withdirection: input
. I think the error message might benefit from a more elaborate explanation, because it triggers the follow-up questions: why is the wildcard needed and how do I choose a correct value for it. Answering the latter question is not as easy because of reasons outlined above. One additional option that sprung to mind (just leaving it here as a mental note) is to introduce a differenttype
for the argument (something liketype: glob
), just to indicate to the user that the behavior ofdirection: output
withmultiple: true
really is different from all the other variants oftype: file
.b). Currently, the only wildcard character that is being checked in the code is
*
(see BashWrapper and Nextflow ). If this is intended, I think we should make this explicit in the documentation by rewording for examplea wildcard character
tothe wildcard character '*'
. This way, the code for the component can also work based on the assumption that only the*
character should be interpreted as a wildcard (and not?
and[]
or other bash globs).The text was updated successfully, but these errors were encountered: