Proposed syntax changes #692

tclose · 2023-08-29T22:51:53Z

tclose
Aug 29, 2023
Maintainer

This thread is to discuss potential syntax changes proposed in #670 and recent developers meetings. These changes can be broken down into three areas that are linked but can be assessed somewhat separately:

Interface design - reworking task interfaces to be more "dataclass-like" and provide a staged learning/develop pathway to progress from simple, generic interfaces to complex, specific ones.
Workflow construction - minimally altering the workflow construction syntax so that code-completion and static-typing are possible, user/core namespaces are separated and nodes don't appear to be able to be executed separately
Task/workflow execution - minimal alterations to task/workflow execution required to support the other changes and overall design goals

These changes should be consistent with the initial design goals for Pydra as laid out by @satra where possible:

simplicity of use without workflows (get a non python coder to quickly adopt it - this was very important)
reduce boilerplate whenever possible
any computable object is a task (combines interfaces/node/workflow into one concept)
task object reuse in a script just like one makes multiple calls to a function
universal caching (multiple workflows can share the same cache directory, or use multiple caches)
message passing (also becomes a way to capture provenance)
parallelization (split/combine in it's full syntactic form and at different levels with nested stuff)
distributed execution (where pickling comes in, and still not solved for functions in two similar environments)
workflow construction through assignment rather than the connect syntax of nipype
container support (this has evolved into run in environment X, where a container is an environment)

and should aim to provide a low-barrier to entry for inexperienced developers to pick up, while be expressive enough for experienced developers to robustly design complex workflows, with a stepped learning pathway in between.

@djarecka @effigies @ghisvail @yibeichan

tclose · 2023-08-30T02:17:08Z

tclose
Aug 30, 2023
Maintainer Author

UPDATED: Incorporated suggestions from @djarecka and @effigies re the preference for "executable" as an attribute rather than decorator args. Similarly changed function tasks to have "execute" method rather than decorator arg.

*I also slightly changed the "basic dynamic" case for shell commands to work in some example extended syntax

Interface design

There was a general consensus in the most recent developer meeting that while everyone is happy with the @task decorator for Python function tasks, the syntax for designing shell command tasks will be a significant barrier for Pydra's adoption because the combination of list, tuples and dicts is quite obscure and using different metadata field options will require frequent checking of the docs.

I believe there is also an opportunity to simplify some the interface initialisation code that appears to have been written when Pydra was using basic dataclasses, so that attrs classes are created directly from the interface definition instead of on-the-fly at execution time. This should open the potential for code-completers and type-checkers to parse Pydra interfaces/workflows to assist development in IDEs, and also make it easier for advanced attrs features such as converters and validators to be used.

Shell command proposal

Basic dynamic case

In the simplest case we could look to have a one-line function that creates a basic shell command, e.g.

>>> from pydra.mark import shell
>>> Cp = shell(
    "cp",
    inputs=[
        "src",
        "recursive:-R"  # flags could be considered boolean by default
    ],
    outputs=["dest:{src.stem}_out{src.ext}"]  # after ':' the output_file_template value could be provided
)
>>> cp = Cp(src="/path/to/file.txt", dest="/path/to/new-file.txt")
>>> cp.cmdline
"cp '/path/to/file' '/path/to/new-file'"

We assume that all shell inputs and outputs are of type typing.Union[fileformats.generic.File, fileformats.generic.Directory].

An example using the output file template default for dest

>>> cp = Cp(src="/path/to/file")
"cp '/path/to/file' 'file_out.txt'"
>>> result  = cp()
>>> result.outputs.dest
File('/private/var/..../file_out.txt')

The point of this syntax is not to be comprehensive but to provide a concise way to declare 80-90% of simple shell commands. If a shell command has a somewhat funky syntax that isn't covered, then the user will need to fallback to the extended syntax below.

The syntax above is just a quick attempt, there are bound to be better ways to define it. However, something that takes list[str] inputs and outputs kwargs would provide a nice symmetry with function tasks.

Extended dynamic case

This syntax could be expanded to take dictionary args for inputs and outputs to provide the option to dynamically create more complex shell interfaces

>>> from pydra.mark import shell
>>> from fileformats.generic import FsObject
>>> Cp = shell(
    "cp",
    inputs={
        "src": {"type": FsObject, "position": -2},
        "recursive": {"type": bool, "argstr": "-R", "help": "recursively copy directories"},
    },
    outputs={
        "dest": {"type": FsObject, "position": -1}
    }
)
>>> cp = Cp(src="/path/to/file", recursive=True,  dest="/path/to/new-file")
>>> cp.cmdline
"cp -R '/path/to/file' '/path/to/new-file'"

Extended case with `shell.arg`, `shell.out` and `shell.out_path`

Alternatively, the shell.arg, shell.out and shell.out_path functions can be used to provide more structure/code-assistance when defining the inputs and outputs

def yesno2bool(value: ty.Union[bool, str]) -> bool:
    if isinstance(value, str):
        if value == "yes":
            value = True
        elif value == "no":
            value = False
        else:
            raise ValueError(f"Could not parse {value!r} to boolean")
    return value

>>> from pydra.mark import shell
>>> from fileformats.generic import FsObject
>>> Cp = shell(
    "cp",
    inputs={
        "src": shell.arg(type=FsObject, position=-2),
        "recursive": shell.arg("-R", type=bool, converter=yesno2bool),
    },
    outputs={
        "dest": shell.out_path(type=FsObject, position=-1, template="{src.stem}_out{src.suffix}")
    }
)

Where shell.out_path designates an output that also needs to be added as an input path to the input spec, and therefore has a different signature.

Dataclass-style

From here is it a relatively small-leap to move to the more readable (I would argue), static, dataclass-style syntax, which would enable static analysis by code-completers and type-checkers.

import os
from pydra.engine import Interface
from pydra.mark import shell

@shell.outputs
class CpOut:
    dest: FsObject = shell.out_path(template="{src.stem}_out{src.suffix}", position=-1)
    file_size: int = shell.out(callable=lambda src: os.path.getsize(src) if src.is_file else ...)


@shell
class Cp(Interface[CpOut]):
    executable = "cp"
    src: FsObject = shell.arg(position=-2)
    recursive: bool = shell.arg("-R", converter=yesno2bool, help="recursively copy directories, otherwise if 'src' is a directory it will raise an error")
    dont_overwrite: bool = shell.arg("-n")

which gets called in the same way

>>> cp = Cp(src="/path/to/file.txt", recursive="yes", dont_overwrite=True)
>>> cp.cmdline
"cp -R -n '/path/to/file' 'file_out.txt'"

It would be feasible to allow the value executable to be set dynamically as an input of the interface (as it I believe it can be currently). I feel a little uneasy about this from a security POV, but I can also envisage cases where the name of the executable is non-standard (e.g. bet-fsl6 or similar) where you might want to manually set the executable. Something to discuss...

Python function proposal

Basic dynamic case (external functions)

Trying to be consistent with the shell-command syntax, I think we could introduce wrapping python functions in a similar way by looking at wrapping an external function first, e.g.

>>> import operator
>>> from pydra.mark import func
>>> TrueDiv = func(operator.truediv)
>>> true_div = TrueDiv(a=10, b=3)
>>> result = true_div()
>>> result.output.out
3

If you need to give names to multiple outputs (i.e. in a tuple), or rename the inputs, then the inputs and outputs kwargs can be provided

>>> from pydra.mark import func
>>> DivMod = func(divmod, inputs=["numerator", "denominator"], outputs=["quotient", "remainder"])
>>> divmod = DivMod(numerator=10, denominator=3)
>>> result = divmod()
>>> result.output.quotient
3
>>> result.output.remainder
1

Extended dynamic case with typing and validators

Similar to shell.interface, we could use dictionaries to facilitate add type annotations and attrs features, such as validators or converters, for dynamically constructed inputs

>>> def not_zero(instance, attribute, value):
    if value == 0:
        raise ValueError(f"'{attribute.name}' field cannot be 0")

>>> DivMod = func(
    divmod,
    inputs={
        "numerator": int,
        "denominator": func.arg(int, validator=not_zero)
    } ,
    outputs={
        "quotient": int,
        "remainder": int
    }
)
    
>>> div_mod = DivMod(numerator=10, denominator=3)
>>> result = div_mod()
>>> result.output.quotient
3
>>> result.output.remainder
1

Function decorator (same as current @task)

For user-defined functions, we could recommend the use of a decorator the same as the current @task syntax, i.e.

@func
def TrueDivPlus1(a: int, b: int) -> int:
    return a // b + 1

Instead of needing to annotate the function to name the outputs, you can reuse the "outputs" keyword to rename a ty.Tuple return type, e.g.

@func(outputs=["quotient", "remainder"])
def DivModePlus1(numerator: int, denominator: int) -> ty.Tuple[int, int]:
    return (numerator // denominator + 1, numerator % denominator + 1)

Dataclass-style

Finally, for functions defined in task libraries, we could offer a dataclass-style syntax very similar to the shell command to enable code-completion/type-checking

@func.outputs
class DivModPlus1Out:
    quotient: int
    remainder: int

@func
class DivModePlus1(Interface[DimModePlus1Out]):
    numerator: int
    denominator: int = func.arg(validator=not_zero)

    def execute(self):
        return (self.numerator // self.denominator + 1, self.numerator % self.denominator + 1)

Matlab and other interface types

I think we should be able to stick to the same pattern for Matlab and any other type of interfaces that we might want to run (e.g. R). Haven't thought it through thoroughly yet though

2 replies

djarecka Aug 31, 2023
Maintainer

Thanks for starting this! I will list my thoughts regarding various aspects:

Re. Shell command proposal

I understand that a short syntax is nice, but I'm personally not sure about the using the name of the output variable as a default name of the output file is very useful, I think this is usually part of the name but hardly ever the full name
I'm also not sure if adding by default all output variables to the inputs is a good idea, I think often this is not the case, you might have to provide the name of the directory in the command line, but your output variables point to a specific files created with the directory
I like the syntax that uses dictionaries for inputs and outputs
I'm not sure if I like the idea of moving the template structure to the output spec instead of leaving this in the input spec (and creating the output automatically). I don't remember what was your motivation for this change
I don't fully understand the idea of out_path, I think I liked the idea of shell_arg and shell_out suggested in the shell_task decorator #655, adding more options might be confusing
I understand the concerns about the security of allowing people to change the executable, but I have the feeling that having it in the decorator is not intuitive. I'd be happy to learn what others think

Re: Python function proposal

I'm not sure if I understand the example for "dynamically constructed function tasks", where do you define what DivMod is doing?
I'm not sure if I would consider the "dataclass-style syntax" to be easier or more readable than the current one, but not saying that we can't add it as an option

tclose Aug 31, 2023
Maintainer Author

Thanks for the feedback @djarecka

I understand that a short syntax is nice, but I'm personally not sure about the using the name of the output variable as a default name of the output file is very useful, I think this is usually part of the name but hardly ever the full name

I'm also not sure if adding by default all output variables to the inputs is a good idea, I think often this is not the case, you might have to provide the name of the directory in the command line, but your output variables point to a specific files created with the directory

Yes, it is a bit of a stretch. I was thinking about it mainly as a way to introduce the syntax and to keep consistency with function interfaces. But you are right that in most cases the expanded syntax will be required.

I'm not sure if I like the idea of moving the template structure to the output spec instead of leaving this in the input spec (and creating the output automatically). I don't remember what was your motivation for this change

I had a few of motivations for this proposal:

We need to the field in the input spec to be Path, or ty.Union[Path, bool] if it is an optional output, whereas we need the field in the output spec to be a FileSet subclass. Specifying the type in the output spec allows us to specify the file format (e.g. NiftiGzX), then magically insert a Path/union[Path, bool] field into the input spec.
the magically inserted field out of the input/output pair will not be statically analysable by pylance/mypy. Since the input field will be typically not provided and left as its default, it seems to make sense for the the output field to be the one that is analysable. If the output field is magically inserted, then we couldn't check any connections from it, meaning that you couldn't really type-check shell-command workflows
I can see why it makes sense from an interface designers POV for the output field to be listed in the inputs, but I don't find it intuitive to look at the input spec in order to work out what the outputs are.

I don't fully understand the idea of out_path, I think I liked the idea of shell_arg and shell_out suggested in the shell_task decorator #655, adding more options might be confusing

This isn't necessary, but the idea is that since these fields are actually an input/output field pair they will need an expanded kwarg list. You could just add these additional kwargs to the shell.out function, but then they wouldn't be relevant for the other output types.

I understand the concerns about the security of allowing people to change the executable, but I have the feeling that having it in the decorator is not intuitive. I'd be happy to learn what others think

That is fine, I'm not wedded to the executable-in-decorator syntax

Re: Python function proposal

I'm not sure if I understand the example for "dynamically constructed function tasks", where do you define what DivMod is doing?

DivMod is wrapping up the built-in divmod function. It doesn't need to be dynamically constructed as it has a fixed arg list so might not be the best example.

I'm not sure if I would consider the "dataclass-style syntax" to be easier or more readable than the current one, but not saying that we can't add it as an option

The current function decorator syntax is nice, but doesn't let you provide help-strings, converters and validators. It also isn't statically analysable. So while I expect most end-users would favour the function decorator, if you were maintaining a task package, e.g. scikit-learn or dipy, you might want to go to the extra effort of adding in supporting these for these features. At least that was my thinking

tclose · 2023-08-30T22:46:39Z

tclose
Aug 30, 2023
Maintainer Author

Workflow construction

The workflow construction syntax is already elegant and easy to follow IMO, so there are only a few subtle changes to propose, which are required to dovetail in with other syntax changes and facilitate code-completion/static type-checking.

Under the hood, things would change a fair bit with workflow definitions becoming lightweight stateless objects consisting of placeholder Node objects instead of full tasks, but this will be almost entirely transparent to the end user. The definition of the Node class would look something like

@attrs.define
class Node(ty.Generic[Out]):
    # Ideally `inputs` would be typed by another type-var for the inputs, but we can't define a type-var as a
    # specialisation of another type var yet, i.e. T[U] (see https://github.com/nekitdev/peps/blob/main/pep-9999.rst),
    # and so have to settle for just be able to type the lzout
    inputs: Interface
    lzout: Out  # This would be instantiated with LazyOutField refs to the outputs of this node
    _task_type: ty.Type[TaskBase]  # reference to task type to instantate, pulled from the interface class
    _splitter: ty.Union[list, tuple, str]  # holds a reference to the splitter defined by split()
    _combiner: ty.Union[list, str]  # holds a reference to the combiner defined by combine()
    _workflow: Workflow
    
    def split(...):
        ...

    def combine(...):
        ...

The signature of Workflow.add() changes from def add(task: Task) -> Workflow to
def add(interface: Interface[T]) -> Node[T], which won't make much difference to the
user as long as they don't want to chain add() calls off one another (which I personally think would be poor style in any case)

So we can define our workflow in a very similar way, with only a few subtle changes:

name keyword-args are now passed to Workflow.add() instead of the task/interface init
name keyword-arg becomes optional and the node is referred to by the name of the interface class, unless two interfaces with the same name are used, in which case one of them needs to be named explicitly
split() and combine() are now performed on the returned Node object instead of the task/interface object, i.e. on the outer parentheses
The node returned by Workflow.add() -> Node is the preferred way to reference upstream connections, since this will typed properly. Note that wf.<node-name> will still work, just it won't be able to be type-checked/code-completed

import pydra.mark

MyWorkflow = pydra.mark.workflow(inputs=["in_files", "in_int"])

myfunc = MyWorkflow.add(
    MyFunc(in_int=MyWorkflow.lzin.in_int, in_str="hi")
)
if blah:
    myfunc.inputs.in_str = "hello"  # Unfortunately, this won't be able to be type-checked due to not being able to subscript a type-var with another type-var in the Workflow.add() signature (see https://github.com/nekitdev/peps/blob/main/pep-9999.rst)

myfunc2 = MyWorkflow.add(
    MyFunc(
        in_int=myfunc.lzout.out_int,  # should be passed ok by mypy
        in_str=myfunc.lzout.out_int,  # should show up as a mypy error because `in_str` expects a str not an int
    ),
    name="myfunc2",  # Tasks can be optionally given a name to differentiate multiple tasks from the same spec class (otherwise defaults to name of spec class) 
)

myshellcmd = MyWorkflow.add(
    MyShellCmd(
        an_option=myfunc2.lzout.out_str,
        another_option=MyWorkflow.myfunc2.lzout.out_int,  # can still access via wf.* if preferred
        out_file="myfile.txt",
    )
).split(in_file=MyWorkflow.lzin.in_files)  # Note method call on the outer parentheses, not inner like current

MyWorkflow.set_output(("out_files", myshellcmd.lzout.out_file))

A couple of minor additional/optional changes

These following suggestions are kind of independent and non-breaking. I have just thrown them in this discussion while we are all here

Typing input and output specs of workflows

Being able to type the input and output specs of workflows, e.g.

OtherWorkflow = pydra.mark.workflow(
    inputs={"in_files": ty.List[File], "in_int": int},
    outputs={"out_str": str, "out_files": ty.List[File]}
)

while they won't be able to be statically type-checked, we can still type-check this at
workflow construction time. Outputs can be type-checked against the lazy-fields that
are connected to them

Setting workflow lzout instead of set_output

This is purely stylistic, but I find the signature of Workflow.set_output (ty.Union[ty.Tuple[str, LazyField], ty.List[ty.Tuple[str, LazyField]]]) a bit cumbersome and hard to remember, so was thinking that maybe

MyWorkflow.lzout.out_str = myfunc2.out_str

as an alternative could be a bit more readable/writable. It is also more symmetric with how we access inputs, i.e. MyWorkflow.lzin, which I think would help people remember it.

Workflows as classes (optional)

I started toying with the idea of workflow definitions producing attrs classes, since in this proposal they are meant to be interchangeable with interface classes.

The syntax described in Workflow construction would still work but just return a dynamically generated attrs class instead of an object. We could also offer the following syntax for library code in order to provide code-completion/type-checking

from fileformats.medimage import NiftiGz, DicomDir
from pydra.engine import Interface
from pydra.mark import workflow
from pydra.tasks.dcm2niix import Dcm2Niix
from pydra.tasks.fsl import Bet


@workflow.outputs
class PreprocOut:
    preprocessed: NiftiGz
    mask: NiftiGz
    

@workflow
class Preprocess(Interface[PreprocOut]):
    t1w: DicomDir
    threshold: float = workflow.arg(help="the threshold used to determine...")

dcm2niix = Preprocess.add(Dcm2Niix(in_file=Preprocess.lzin.t1w))

bet = Preprocess.add(Bet(in_file=dcm2niix.lzout.out_file, threshold=Preprocess.lzin.threshold))

Preprocess.lzout.mask = bet.lzout.mask_file

...

We could even extend it to include nodes in the class definition for workflow construction without much conditional logic

from fileformats.medimage import NiftiGz, DicomDir
from pydra.engine import Interface
from pydra.mark import workflow
from pydra.tasks.dcm2niix import Dcm2Niix
from pydra.tasks.fsl import Bet


@workflow.outputs
class PreprocOut:
    preprocessed: NiftiGz
    mask: NiftiGz
    

@workflow
class Preprocess(Interface[PreprocOut]):
    t1w: DicomDir = workflow.arg()
    threshold: float = workflow.arg(help="the threshold used to determine...")

    dcm2niix = workflow.node(Dcm2Niix(in_file=t1w))
    bet = workflow.node(Bet(in_file=dcm2niix.lzout.out_file, threshold=threshold))

    ...

    outputs = PreprocOut(
        preprocessed=final_step.lzout.output,
        mask=bet.lzout.mask_file
    )

Has the nice property that the assigned node variables become attributes of the class.

Avoiding `lzout` (optional)

One mistake I find myself frequently making when writing workflows is forgetting to include lzout when accessing upstream outputs. To avoid this and streamline the syntax a little bit we could change the signature of Workflow.add(Interface[Out]) -> Node[Out] to Workflow.add(Interface[Out]) -> Out. The workflow construction syntax would then look something like

import pydra.mark

MyWorkflow = pydra.mark.workflow(inputs=["in_files", "in_int"])

myfunc_out = MyWorkflow.add(
    MyFunc(in_int=MyWorkflow.inputs.in_int, in_str="hi"),  # since `lzout` is no longer prominent, it might make sense to rename workflow `lzin` to `inputs` to be a bit easier to remember
    name="myfunc"
)
if blah:
    MyWorkflow.myfunc.inputs.in_str = "hello"  # To set inputs after the node has been added, we need to access the node

myfunc2_out = MyWorkflow.add(
    MyFunc(
        in_int=myfunc_out.out_int,
        in_str=myfunc_out.out_int,  # mypy type error
    ),
    name="myfunc2",
)

myshellcmd_out = MyWorkflow.add(
    MyShellCmd(
        an_option=myfunc2_out.out_str,
        another_option=MyWorkflow.myfunc2.lzout.out_int,  # can still access via wf.*.lzout if preferred
        out_file="myfile.txt",
    ),
    name="myshellcmd"
)

# Split calls would need to be performed on separate line and reference the node. Has benefit that the
# name of the node being split is explicitly used
wf.myshellcmd.split(in_file=MyWorkflow.inputs.in_files) 

MyWorkflow.outputs.out_files = myshellcmd_out.out_file  # likewise, maybe renaming workflow `lzout` to `outputs` would be more user-friendly

4 replies

djarecka Sep 18, 2023
Maintainer

sorry, forgot to write last time I was reading it, but I don't fully understand what we will gain by introducing Node...
I'm reading this again and I'm still not sure. Perhaps we can just leave it for the discussion tomorrow.

tclose Sep 19, 2023
Maintainer Author

Sorry, I probably didn't introduce the reasoning that led me to this in this post, it is partly to do with the execution post below. My heuristics were to

separate user-defined parameters into a clean namespace so they don't clash with task configuration names
make workflows equivalent with task interfaces (i.e. stateless) rather than tasks (i.e. parameterised with state + configuration) so they can be reused at different points in an outer workflow.

djarecka Sep 19, 2023
Maintainer

but if you look at the example from the current tests, this workflow is also stateless before explicitly adding a splitter here, so you could always define stateless workflow first.

tclose Sep 19, 2023
Maintainer Author

Wouldn't you need to do a deepcopy of the workflow if you wanted to use it at another point in your code so the split you add on line 854 isn't used at the new point as well? In my Arcana code, I solve this issue by wrapping such workflows functions, but this requires duplicating the inputs in the signature of the function.

tclose · 2023-08-30T23:50:58Z

tclose
Aug 30, 2023
Maintainer Author

Task/Workflow execution

To accommodate the removal of task configuration parameters from the shell/function interfaces, the task/workflow execution procedure has to change a little bit, particularly under the hood.

Now instead of providing the cache dir when instantiating the task/interface we pass it to the execution call, i.e.

# Parameterisation of the interface, only contains parameters
myfunc = MyFunc(in_int=1, in_str="hi")

# Configuration & execution of the interface in a single step
result = myfunc(cache_dir="/path/to/cache", environment=Docker("myfuncdocker:1.0"), plugin="cf")

This allows us to keep the user namespace in the interface completely separate from the configuration/execution parameters.

Workflows are conceptually shifted from being interchangeable with tasks to being interchangeable with interface classes, e.g.

Given a workflow defined as

# Workflow definition (typically in a different module)
Preprocess = pydra.mark.workflow(inputs=["in_files", "in_int"])
Preprocess.add(...)
...
Preprocess.set_output(...)

It is first parameterised (as if it were an interface class), and then executed

# Workflow parameterisation step (THIS IS NEW!)
preprocess = Preprocess(in_files=["/path/to/file1", "/path/to/file2"], in_int=2)

# Execution step
result = preprocess(cache_dir="/path/to/cache", plugin="cf")

In order to be able to access the task instance (not to be confused with the instantiated Interface, e.g. myfunc), it could be set as an attribute of the result, i.e.

print(result.task.output_dir)

Since the parameterised tasks/workflows are now stateless, if you want to split/combine the outer task/workflow, you will now need to provide the splitter/combiner as a kwarg at execution

AlternativePreprocess = pydra.mark.workflow(inputs=["in_file", "threshold"])
...

# Workflow parameterisation step
alt_preprocess = AlternativePreprocess(threshold=0.5)

result = alt_preprocess(plugin="cf", inputs={"in_file": ["/path/to/file1", "/path/to/file2"]}, split="in_file")

We could also allow the following simplified syntax for simple outer-only splits (which would be 99% of cases for splits at the execution stage I imagine)

result = alt_preprocess(plugin="cf", split={"in_file": ["/path/to/file1", "/path/to/file2"]})

0 replies

tclose · 2023-08-30T23:58:10Z

tclose
Aug 30, 2023
Maintainer Author

Summary of proposed BW-compatibility-breaking syntax changes (copied from #670)

Interface design

@task decorator syntax would be maintained (could be renamed to pydra.mark.func or pydra.mark.function to distinguish from other decorators)
All task interfaces would become dataclass-like classes (function tasks are currently functions that return FunctionTask instances)
Dynamic construction of task interfaces would be created from JSON/YAML-like objects

Workflow construction

Workflow.add(Task) -> Workflow signature would become Workflow.add(Interface) -> Node, where Node is a new light-weight placeholder object (i.e. couldn't be executed as a task)
Interfaces don't accept the name or any other kwarg, just the params
- Node names (relative to the workflow) would default to the name of the interface to avoid having to explicitly name the node in addition to the variable it is assigned to.
- Explicit names would need to be provided if the same interface is used twice, but would be passed as a kwarg to the add() method instead.
Split/combines would be performed on Node objects, not Interface objects, e.g. wf.add(MyNode(...)).split(...) instead of wf.add(MyNode(...).split(...))

Task/workflow execution

The "configuration" of a task (e.g. cache locations & environments) would be moved from the __init__() (leaving it a pure parameterisation) to the execution stage, i.e. __call__()
- Implicitly created "task" instances can be accessed as attribute of results after the execution, e.g. result = interface(); print(result.task.output_dir)
- splitting and combining of the top level object would become kwargs of the __call__ method
Parameterisation of tool "interfaces" results in an Interface instance rather than a Task.

Pros

Static type-checking and code-completion.
Separates parameterisation namespace from configuration option namespace
- i.e. name, audit_flags, cache_dir, cache_locations, inputs, cont_dim, messenger_args, messengers, rerun, input_spec, output_spec, propagate_rerun
- Would allow us to freely add new configuration options without potentially breaking workflows, i.e. no risk of clashes between the new option and an interface input name someone has defined.
Instantiating "interface" classes returns "less-surprising" dataclass-style object
Workflows become stateless classes, or class-like-objects, that can be reused at various points in multiple "meta-workflows"
- (NB: I have taken to writing methods that return workflows in order to artificially delay the construction of the workflow so I can use it at multiple points with separate params/states, e.g. format converter workflows).
- You avoid the slightly confusing syntax options you have now where you provide the input_spec and inputs kwargs to a workflow in the workflow init, and potentially split the workflow before it has been constructed.
Avoid having to use workflow name as a prefix (e.g. "wf.*"). However, nodes can still be accessed this way if required/desired
...

Cons

Obscure the task object from the user as an attribute of the result.
Variable names nodes that are assigned to wouldn't necessarily match name of node in workflow (i.e. wf.node-name)
...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed syntax changes #692

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Re: Python function proposal

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Proposed syntax changes #692

tclose Aug 29, 2023 Maintainer

Replies: 4 comments · 6 replies

tclose Aug 30, 2023 Maintainer Author

Interface design

Shell command proposal

Basic dynamic case

Extended dynamic case

Extended case with shell.arg, shell.out and shell.out_path

Dataclass-style

Python function proposal

Basic dynamic case (external functions)

Extended dynamic case with typing and validators

Function decorator (same as current @task)

Dataclass-style

Matlab and other interface types

djarecka Aug 31, 2023 Maintainer

Re. Shell command proposal

Re: Python function proposal

tclose Aug 31, 2023 Maintainer Author

Re: Python function proposal

tclose Aug 30, 2023 Maintainer Author

Workflow construction

A couple of minor additional/optional changes

Typing input and output specs of workflows

Setting workflow lzout instead of set_output

Workflows as classes (optional)

Avoiding lzout (optional)

djarecka Sep 18, 2023 Maintainer

tclose Sep 19, 2023 Maintainer Author

djarecka Sep 19, 2023 Maintainer

tclose Sep 19, 2023 Maintainer Author

tclose Aug 30, 2023 Maintainer Author

Task/Workflow execution

tclose Aug 30, 2023 Maintainer Author

Summary of proposed BW-compatibility-breaking syntax changes (copied from #670)

Interface design

Workflow construction

Task/workflow execution

Pros

Cons

tclose
Aug 29, 2023
Maintainer

Replies: 4 comments 6 replies

tclose
Aug 30, 2023
Maintainer Author

Extended case with `shell.arg`, `shell.out` and `shell.out_path`

djarecka Aug 31, 2023
Maintainer

tclose Aug 31, 2023
Maintainer Author

tclose
Aug 30, 2023
Maintainer Author

Avoiding `lzout` (optional)

djarecka Sep 18, 2023
Maintainer

tclose Sep 19, 2023
Maintainer Author

djarecka Sep 19, 2023
Maintainer

tclose Sep 19, 2023
Maintainer Author

tclose
Aug 30, 2023
Maintainer Author

tclose
Aug 30, 2023
Maintainer Author