-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
defining processes and pipelines schemas is too complex #293
Comments
For 1. I see (reading the code) that a dataset name can be assigned to a parameter in its field metadata. I don't know if it's a complete solution, but at least we have something to do it. |
There was a huge change in process completion paradigm and two changes makes backward compatibility difficult:
We may have to reintroduce a kind of data type that would not capture all the diversity of the data but could define a metadata attribute usage model. Each model could contain some forced metadata values (allowing to force undefined values for attributes that are not to be used). To my opinion, these model names should be known by both pipelines and data schema. In pipeline, they would be simple string defined in field. They would define a partition of parameters according to their metadata need for path generation. Each dataset schema would contains an actual definition of attribute usage rules for each model name. For point 1. default dataset of a parameter is either And point 4 is not implemented yet. We need to put back formats in Capsul v3 and to link them to configuration to get default values. |
Point 1 should be resolved in fields metadata, yes, I'm currently trying this solution. |
Used/unused attributes per parameter is ok. This is what I was talking about (as well as forced attribute values). I was just thinking that this definition could be repetitive for many parameters sharing the same metadata attributes usage. So I wonder if it could be interesting to store some metadata usage profiles behind a single name to ease their reuse in several parameters. But, I do not have a concrete example such as Morphologist pipeline in front of me. Therefore it may be useless to go beyond explicit definition of metadata usage for each parameter. |
I can try something, and tell if it's too fastidious... |
It's more or less working, but it's still somewhat painful to write the rules, and even worse to read them afterwards. |
Point 1 is OK, I have implemented it for the Morphologist pipeline and subprocesses. For points 2/3, I have implemented the used/unused metadata per parameter system. A However there are advantages and drawbacks:
Drawbacks:
|
I've got an idea that could simplify the definition of schemas. It's still in its infancy and we'll have to work on it to see if it's viable. In my opinion, the difficulty in defining
In that case
We could also imagine that these methods also receive metadata (for instance computed by inner processes of a pipeline) and can use them to propagate metadata values from an input parameter to an output parameter:
I think we should try to write what could be this method for the most complex cases in Morphologist to see if it is an interesting idea or not. |
I am working on the simplification of process schema definition and have a question. Do we agree that it should not be not possible to customize path generation inside a pipeline ? If one links a node output to another node input without exportation, this is supposed to be an internal temporary value. In that case path generation is useless. Therefore, a pipeline should only be allowed to customize path generation of its external inputs and outputs but not its internal plugs. |
I globally agree. A pipeline should be a black box which doesn't leave files not described in their parameters. |
I plan to call process schema function on inner nodes first so that a
pipeline can use the result for its own parameters. The schema only
provides metadata, not file names. At the end of the process I plan to
consider only the outer pipeline parameters to perform path generation. I
think inner parameters that are not exported to the main pipeline should
either be temporary or not written by a process because these files will be
out of scope for any further processing by Capsul (no metadata, no history,
no visualization, etc).
|
In the new schema definition branch, I added the possibility to define a boolean
|
Completion schemas definition is not simple enough at the moment, and it suffers several problems:
ProcessMetatada
is instantiated by the end user or application. Its constructors needs to be given adatasets
parameter which specifies which dataset is assigned to each process parameter (input
,output
,shared
...). In most situations the user doesn't know about it and doesn't want to specify them. Moreover a generic application cannot have the list of all parameters of all processes for that, so processes should provide at least a default for this.ProcessSchema
subclasses need to specify all this, parameter by parameter, and sometimes have to erase attributes values, and even worse, sometimes have to set values back after they have been erased for other parameters.Ex in
capsul.brainvisa.schemas
, we need to write things like:in the 1st class here we need to erase
prefix
, while in the second we need to restore (by a very unconvenient trick)analysis
which has been erased earlier. We need to find a simpler syntax or system, maybe allowing to tell that thehead_mesh
parameter inScalpMesh
does not use theprefix
attribute, rather than modifying it in an unintuitive way, which has side effects later. This info would be useful also in order to simplify GUIs and not display to the users attributes which are not used, or only internally defined and used (like theside
attribute inside aMorphologist
pipeline).3. Maybe for this reason, metadata defaults don't work the way they should: for instance the
sulci_recognition_session
attribute in the BrainVisa schema is only used in identified sulci graphs parameters in Morphologist and other processes, and not in others. So it cannot have a default value (otherwise it would appear in all filenames of all parameters). Right now its value is set, forced, in theProcessSchema
for the labeled graph parameters, but so the user (or program) cannot change it at all.4. Similarly, formats and extensions are forced here and then in processes and pipeline parameters schemas, and the user cannot choose a format for outputs, except for volumes if the default is left untouched. We need to develop something to manage formats.
The text was updated successfully, but these errors were encountered: