Replies: 4 comments 6 replies
-
UPDATED: Incorporated suggestions from @djarecka and @effigies re the preference for "executable" as an attribute rather than decorator args. Similarly changed function tasks to have "execute" method rather than decorator arg. *I also slightly changed the "basic dynamic" case for shell commands to work in some example extended syntax Interface designThere was a general consensus in the most recent developer meeting that while everyone is happy with the I believe there is also an opportunity to simplify some the interface initialisation code that appears to have been written when Pydra was using basic dataclasses, so that attrs classes are created directly from the interface definition instead of on-the-fly at execution time. This should open the potential for code-completers and type-checkers to parse Pydra interfaces/workflows to assist development in IDEs, and also make it easier for advanced attrs features such as converters and validators to be used. Shell command proposalBasic dynamic caseIn the simplest case we could look to have a one-line function that creates a basic shell command, e.g. >>> from pydra.mark import shell
>>> Cp = shell(
"cp",
inputs=[
"src",
"recursive:-R" # flags could be considered boolean by default
],
outputs=["dest:{src.stem}_out{src.ext}"] # after ':' the output_file_template value could be provided
)
>>> cp = Cp(src="/path/to/file.txt", dest="/path/to/new-file.txt")
>>> cp.cmdline
"cp '/path/to/file' '/path/to/new-file'" We assume that all shell inputs and outputs are of type An example using the output file template default for >>> cp = Cp(src="/path/to/file")
"cp '/path/to/file' 'file_out.txt'"
>>> result = cp()
>>> result.outputs.dest
File('/private/var/..../file_out.txt') The point of this syntax is not to be comprehensive but to provide a concise way to declare 80-90% of simple shell commands. If a shell command has a somewhat funky syntax that isn't covered, then the user will need to fallback to the extended syntax below. The syntax above is just a quick attempt, there are bound to be better ways to define it. However, something that takes Extended dynamic caseThis syntax could be expanded to take dictionary args for inputs and outputs to provide the option to dynamically create more complex shell interfaces >>> from pydra.mark import shell
>>> from fileformats.generic import FsObject
>>> Cp = shell(
"cp",
inputs={
"src": {"type": FsObject, "position": -2},
"recursive": {"type": bool, "argstr": "-R", "help": "recursively copy directories"},
},
outputs={
"dest": {"type": FsObject, "position": -1}
}
)
>>> cp = Cp(src="/path/to/file", recursive=True, dest="/path/to/new-file")
>>> cp.cmdline
"cp -R '/path/to/file' '/path/to/new-file'" Extended case with
|
Beta Was this translation helpful? Give feedback.
-
Workflow constructionThe workflow construction syntax is already elegant and easy to follow IMO, so there are only a few subtle changes to propose, which are required to dovetail in with other syntax changes and facilitate code-completion/static type-checking. Under the hood, things would change a fair bit with workflow definitions becoming lightweight stateless objects consisting of placeholder @attrs.define
class Node(ty.Generic[Out]):
# Ideally `inputs` would be typed by another type-var for the inputs, but we can't define a type-var as a
# specialisation of another type var yet, i.e. T[U] (see https://github.com/nekitdev/peps/blob/main/pep-9999.rst),
# and so have to settle for just be able to type the lzout
inputs: Interface
lzout: Out # This would be instantiated with LazyOutField refs to the outputs of this node
_task_type: ty.Type[TaskBase] # reference to task type to instantate, pulled from the interface class
_splitter: ty.Union[list, tuple, str] # holds a reference to the splitter defined by split()
_combiner: ty.Union[list, str] # holds a reference to the combiner defined by combine()
_workflow: Workflow
def split(...):
...
def combine(...):
... The signature of So we can define our workflow in a very similar way, with only a few subtle changes:
import pydra.mark
MyWorkflow = pydra.mark.workflow(inputs=["in_files", "in_int"])
myfunc = MyWorkflow.add(
MyFunc(in_int=MyWorkflow.lzin.in_int, in_str="hi")
)
if blah:
myfunc.inputs.in_str = "hello" # Unfortunately, this won't be able to be type-checked due to not being able to subscript a type-var with another type-var in the Workflow.add() signature (see https://github.com/nekitdev/peps/blob/main/pep-9999.rst)
myfunc2 = MyWorkflow.add(
MyFunc(
in_int=myfunc.lzout.out_int, # should be passed ok by mypy
in_str=myfunc.lzout.out_int, # should show up as a mypy error because `in_str` expects a str not an int
),
name="myfunc2", # Tasks can be optionally given a name to differentiate multiple tasks from the same spec class (otherwise defaults to name of spec class)
)
myshellcmd = MyWorkflow.add(
MyShellCmd(
an_option=myfunc2.lzout.out_str,
another_option=MyWorkflow.myfunc2.lzout.out_int, # can still access via wf.* if preferred
out_file="myfile.txt",
)
).split(in_file=MyWorkflow.lzin.in_files) # Note method call on the outer parentheses, not inner like current
MyWorkflow.set_output(("out_files", myshellcmd.lzout.out_file)) A couple of minor additional/optional changesThese following suggestions are kind of independent and non-breaking. I have just thrown them in this discussion while we are all here Typing input and output specs of workflowsBeing able to type the input and output specs of workflows, e.g. OtherWorkflow = pydra.mark.workflow(
inputs={"in_files": ty.List[File], "in_int": int},
outputs={"out_str": str, "out_files": ty.List[File]}
) while they won't be able to be statically type-checked, we can still type-check this at Setting workflow lzout instead of set_outputThis is purely stylistic, but I find the signature of MyWorkflow.lzout.out_str = myfunc2.out_str as an alternative could be a bit more readable/writable. It is also more symmetric with how we access inputs, i.e. Workflows as classes (optional)I started toying with the idea of workflow definitions producing attrs classes, since in this proposal they are meant to be interchangeable with interface classes. The syntax described in Workflow construction would still work but just return a dynamically generated from fileformats.medimage import NiftiGz, DicomDir
from pydra.engine import Interface
from pydra.mark import workflow
from pydra.tasks.dcm2niix import Dcm2Niix
from pydra.tasks.fsl import Bet
@workflow.outputs
class PreprocOut:
preprocessed: NiftiGz
mask: NiftiGz
@workflow
class Preprocess(Interface[PreprocOut]):
t1w: DicomDir
threshold: float = workflow.arg(help="the threshold used to determine...")
dcm2niix = Preprocess.add(Dcm2Niix(in_file=Preprocess.lzin.t1w))
bet = Preprocess.add(Bet(in_file=dcm2niix.lzout.out_file, threshold=Preprocess.lzin.threshold))
Preprocess.lzout.mask = bet.lzout.mask_file
... We could even extend it to include nodes in the class definition for workflow construction without much conditional logic from fileformats.medimage import NiftiGz, DicomDir
from pydra.engine import Interface
from pydra.mark import workflow
from pydra.tasks.dcm2niix import Dcm2Niix
from pydra.tasks.fsl import Bet
@workflow.outputs
class PreprocOut:
preprocessed: NiftiGz
mask: NiftiGz
@workflow
class Preprocess(Interface[PreprocOut]):
t1w: DicomDir = workflow.arg()
threshold: float = workflow.arg(help="the threshold used to determine...")
dcm2niix = workflow.node(Dcm2Niix(in_file=t1w))
bet = workflow.node(Bet(in_file=dcm2niix.lzout.out_file, threshold=threshold))
...
outputs = PreprocOut(
preprocessed=final_step.lzout.output,
mask=bet.lzout.mask_file
) Has the nice property that the assigned node variables become attributes of the class. Avoiding
|
Beta Was this translation helpful? Give feedback.
-
Task/Workflow executionTo accommodate the removal of task configuration parameters from the shell/function interfaces, the task/workflow execution procedure has to change a little bit, particularly under the hood. Now instead of providing the cache dir when instantiating the task/interface we pass it to the execution call, i.e. # Parameterisation of the interface, only contains parameters
myfunc = MyFunc(in_int=1, in_str="hi")
# Configuration & execution of the interface in a single step
result = myfunc(cache_dir="/path/to/cache", environment=Docker("myfuncdocker:1.0"), plugin="cf") This allows us to keep the user namespace in the interface completely separate from the configuration/execution parameters. Workflows are conceptually shifted from being interchangeable with tasks to being interchangeable with interface classes, e.g. Given a workflow defined as # Workflow definition (typically in a different module)
Preprocess = pydra.mark.workflow(inputs=["in_files", "in_int"])
Preprocess.add(...)
...
Preprocess.set_output(...) It is first parameterised (as if it were an interface class), and then executed # Workflow parameterisation step (THIS IS NEW!)
preprocess = Preprocess(in_files=["/path/to/file1", "/path/to/file2"], in_int=2)
# Execution step
result = preprocess(cache_dir="/path/to/cache", plugin="cf") In order to be able to access the task instance (not to be confused with the instantiated print(result.task.output_dir) Since the parameterised tasks/workflows are now stateless, if you want to split/combine the outer task/workflow, you will now need to provide the splitter/combiner as a kwarg at execution AlternativePreprocess = pydra.mark.workflow(inputs=["in_file", "threshold"])
...
# Workflow parameterisation step
alt_preprocess = AlternativePreprocess(threshold=0.5)
result = alt_preprocess(plugin="cf", inputs={"in_file": ["/path/to/file1", "/path/to/file2"]}, split="in_file") We could also allow the following simplified syntax for simple outer-only splits (which would be 99% of cases for splits at the execution stage I imagine) result = alt_preprocess(plugin="cf", split={"in_file": ["/path/to/file1", "/path/to/file2"]}) |
Beta Was this translation helpful? Give feedback.
-
Summary of proposed BW-compatibility-breaking syntax changes (copied from #670)Interface design
Workflow construction
Task/workflow execution
Pros
Cons
|
Beta Was this translation helpful? Give feedback.
-
This thread is to discuss potential syntax changes proposed in #670 and recent developers meetings. These changes can be broken down into three areas that are linked but can be assessed somewhat separately:
These changes should be consistent with the initial design goals for Pydra as laid out by @satra where possible:
reduce boilerplate whenever possible
task object reuse in a script just like one makes multiple calls to a function
parallelization (split/combine in it's full syntactic form and at different levels with nested stuff)
and should aim to provide a low-barrier to entry for inexperienced developers to pick up, while be expressive enough for experienced developers to robustly design complex workflows, with a stepped learning pathway in between.
@djarecka @effigies @ghisvail @yibeichan
Beta Was this translation helpful? Give feedback.
All reactions