Expand wrroc with fields needed for federated analysis #37

famosab · 2024-11-07T13:16:33Z

Pull request to reference what we extended.

Add workflow file to crate. Correct path to output files in metadata file. Add CreateActions for tool executions. Add ControlActions. Add SoftwareApplications for processes. Add HowToSteps. Add agent information (orcid, name) to nextflow.config that is incorporated into metadata file.

Use outdir from config as publishDir Remove nested list for createAction results Correct controlAction id

Add license information according to Workflow RO-Crate Handle parameters set in workflow Add PropertyValue for FormalParameters Add PropertyValue to object field for CreateAction of workflow Add nextflow_schema.json to outdir if present

* Add objects to CreateActions * Add license entity if license information in nextflow.config is a valid URL * Fix bug: Don't add unpublished files to CreateAction results. Causes NullPointerException, as they aren't contained in workflowOutputs map

* (Hopefully) all input/output files are now copied into the crate * Add Intermediate results ro-crate-metadata.json * Add nextflow.config to ro-crate-metadata.json * Add datePublished to Dataset * valid RO-Crate according to rocrate-validator Signed-off-by: fbartusch <[email protected]>

* Change HowToStep position to string * Remove nextflow.config object from OrganizeAction * add author to root data entity * passes roc-validator with severity REQUIRED Signed-off-by: fbartusch <[email protected]>

ewels · 2024-11-08T11:37:15Z

nextflow.config

+            agent {
+                name = "John Doe"
+                orcid = "https://orcid.org/0000-0000-0000-0000"
+            }


I think we can pull this information from the new manifest.contributors config scope now:

https://www.nextflow.io/docs/latest/reference/config.html#config-manifest

This new output format for the nf-prov plugin creates RO-Crates that follow this profile.
The agent mentioned in the config is not the persons who wrote the workflow, but the person who runs the workflow.

ewels · 2024-11-08T11:37:51Z

nextflow.config

+                name = "John Doe"
+                orcid = "https://orcid.org/0000-0000-0000-0000"
+            }
+            license = "https://spdx.org/licenses/Apache-2.0"


Can we not pull this information automatically from a LICEN[CS]E.(md|txt) file?

This new output format for the nf-prov plugin creates RO-Crates that follow this profile.

Similar to your previous review concerning the agent, the license is added directly to the RO-Crate. The crate will include input and output files, the pipeline configuration, and the pipeline itself. I will ensure that the ro-crate-metadata.json annotates all Workflow components (i.e., everything created by the original workflow author) with the license specified in a LICENSE.(md|txt) file.

famosab

@bentsherman I tried to point out the two main things I worked on :)

famosab · 2024-11-11T13:16:41Z

plugins/nf-prov/src/main/nextflow/prov/WrrocRenderer.groovy

+        final perTool = nextflowProcesses
+            .collect() { process ->
+                // read in meta.yaml file (nf-core style)
+                def metaYaml = readMetaYaml(process)
+                // get ext properties from process
+                def processorConfig = process.getConfig()
+                def extProperties = processorConfig.ext as Map
+                // use either ext property 'name' or 'name' from meta.yaml
+                def toolNameTask = extProperties.containsKey('name') ? extProperties.get('name') as String : metaYaml.get('name')
+
+                def listOfToolMaps = new ArrayList()
+                metaYaml.get('tools').each { tool -> listOfToolMaps.add( tool as Map ) }
+
+                // get descriptions for all tools used in process
+                // TODO: adapt so that multi-tool-processes get rendered seperately
+                // TODO: extract more information from meta.yaml
+                def softwareMap =  [:]
+                def listOfDescriptions = new ArrayList()
+                listOfToolMaps.each { toolMap ->
+                    toolMap.each { field -> 
+                        def fieldMap = field as Map
+                        field.iterator().each {entry -> 
+                            entry.iterator().each {entryField -> 
+                                def entryFieldMap = entryField.getAt("value") as Map
+                                listOfDescriptions.add(entryFieldMap.getAt("description"))
+                                }
+
+                        }
+                    }
+                    softwareMap[toolNameTask] = listOfDescriptions
+                }
+
+                def createSoftwareFinal = [
+                    "@id"         : toolNameTask,
+                    "@type"       : "SoftwareApplication",
+                    "description" : softwareMap.getAt(toolNameTask).toString()
+                ]
+                return createSoftwareFinal
+            }
+
+        final wfToolDescriptions = perTool.collect()


@bentsherman This is the new function I wrote. As you can see I did a lot of weird stuff to be able to access the nested yaml file.

famosab · 2024-11-11T13:18:13Z

plugins/nf-prov/src/main/nextflow/prov/WrrocRenderer.groovy

+    /**
+     * Read meta.yaml (nf-core style) file for a given Nextflow process.
+     *
+     * @param   TaskProcessor processor Nextflow process
+     * @return  Yaml as Map
+     */
+    static Map readMetaYaml(TaskProcessor processor) {
+        WorkflowMetadata workflow = processor.getOwnerScript()?.getBinding()?.getVariable('workflow') as WorkflowMetadata
+        String projectDir = workflow?.getProjectDir()?.toString()
+
+        // TODO: adapt this function to work with non-nf-core yaml files
+        if (projectDir) {
+            String moduleName = processor.getName()
+
+            // Split the module name to get the tool name (last part)
+            String[] moduleNameParts = moduleName.split(':')
+            String toolName = moduleNameParts.length > 0 ? moduleNameParts[-1].toLowerCase() : moduleName.toLowerCase()
+
+            // Construct the path to the meta.yml file
+            Path metaFile = Paths.get(projectDir, 'modules', 'nf-core', toolName, 'meta.yml')
+
+            if (Files.exists(metaFile)) {
+                Yaml yaml = new Yaml()
+                return yaml.load(metaFile.text) as Map
+                }
+            }
+        return null
+        }


@bentsherman Maybe the problem is this function. Ideally I would like to read the meta.yaml into different objects divided by the toplevel key (so name, description, tools, input, output). And hand this with the process to the functions in this groovy class. But maybe there is another solution, too.

I guess you have to do a bit of guessing to figure out where the meta.yml is relative to the main script.

Actually you should be able to get it this way:

ScriptMeta.get(processor.getOwnerScript()).getModuleDir().resolve('meta.yml')

To use ScriptMeta:

import nextflow.script.ScriptMeta

fbartusch and others added 18 commits June 19, 2024 13:40

Fix small errors

ac05da2

Use outdir from config as publishDir Remove nested list for createAction results Correct controlAction id

Improve WRROC renderer

3509ac1

Add license information according to Workflow RO-Crate Handle parameters set in workflow Add PropertyValue for FormalParameters Add PropertyValue to object field for CreateAction of workflow Add nextflow_schema.json to outdir if present

Add config to crate

12e46bd

Improve WRROC renderer

e047834

* Add objects to CreateActions * Add license entity if license information in nextflow.config is a valid URL * Fix bug: Don't add unpublished files to CreateAction results. Causes NullPointerException, as they aren't contained in workflowOutputs map

Improt WRROC

4295943

* Change HowToStep position to string * Remove nextflow.config object from OrganizeAction * add author to root data entity * passes roc-validator with severity REQUIRED Signed-off-by: fbartusch <[email protected]>

add changes

ef96fea

add changes wip

5c3ed70

fix brackets

473f88b

test yamls

b8f098d

fancy stuff

e5d2e7b

break

becb3dd

pain

38f4c64

pain mal 2+

2e5f648

try to flatten output does not work

940a1ab

this works yay

53ce54b

cleanup print statements

332ad35

famosab mentioned this pull request Nov 8, 2024

Improve WrrocRenderer #33

Open

ewels reviewed Nov 8, 2024

View reviewed changes

famosab commented Nov 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand wrroc with fields needed for federated analysis #37

Expand wrroc with fields needed for federated analysis #37

famosab commented Nov 7, 2024

ewels Nov 8, 2024

fbartusch Nov 8, 2024

ewels Nov 8, 2024

fbartusch Nov 8, 2024

famosab left a comment

famosab Nov 11, 2024

famosab Nov 11, 2024

bentsherman Nov 13, 2024

bentsherman Nov 13, 2024

Expand wrroc with fields needed for federated analysis #37

Are you sure you want to change the base?

Expand wrroc with fields needed for federated analysis #37

Conversation

famosab commented Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

famosab left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment