Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand wrroc with fields needed for federated analysis #37

Open
wants to merge 18 commits into
base: workflow-run-crate
Choose a base branch
from

Conversation

famosab
Copy link

@famosab famosab commented Nov 7, 2024

Pull request to reference what we extended.

fbartusch and others added 18 commits June 19, 2024 13:40
Add workflow file to crate.
Correct path to output files in metadata file.
Add CreateActions for tool executions.
Add ControlActions.
Add SoftwareApplications for processes.
Add HowToSteps.
Add agent information (orcid, name) to nextflow.config that is incorporated into metadata file.
Use outdir from config as publishDir
Remove nested list for createAction results
Correct controlAction id
Add license information according to Workflow RO-Crate
Handle parameters set in workflow
Add PropertyValue for FormalParameters
Add PropertyValue to object field for CreateAction of workflow
Add nextflow_schema.json to outdir if present
* Add objects to CreateActions
* Add license entity if license information in nextflow.config is a valid URL
* Fix bug: Don't add unpublished files to CreateAction results. Causes NullPointerException, as they aren't contained in workflowOutputs map
* (Hopefully) all input/output files are now copied into the crate
* Add Intermediate results ro-crate-metadata.json
* Add nextflow.config to ro-crate-metadata.json
* Add datePublished to Dataset
* valid RO-Crate according to rocrate-validator

Signed-off-by: fbartusch <[email protected]>
* Change HowToStep position to string
* Remove nextflow.config object from OrganizeAction
* add author to root data entity
* passes roc-validator with severity REQUIRED

Signed-off-by: fbartusch <[email protected]>
@famosab famosab mentioned this pull request Nov 8, 2024
Comment on lines +26 to +29
agent {
name = "John Doe"
orcid = "https://orcid.org/0000-0000-0000-0000"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can pull this information from the new manifest.contributors config scope now:

https://www.nextflow.io/docs/latest/reference/config.html#config-manifest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new output format for the nf-prov plugin creates RO-Crates that follow this profile.
The agent mentioned in the config is not the persons who wrote the workflow, but the person who runs the workflow.

name = "John Doe"
orcid = "https://orcid.org/0000-0000-0000-0000"
}
license = "https://spdx.org/licenses/Apache-2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not pull this information automatically from a LICEN[CS]E.(md|txt) file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new output format for the nf-prov plugin creates RO-Crates that follow this profile.

Similar to your previous review concerning the agent, the license is added directly to the RO-Crate. The crate will include input and output files, the pipeline configuration, and the pipeline itself. I will ensure that the ro-crate-metadata.json annotates all Workflow components (i.e., everything created by the original workflow author) with the license specified in a LICENSE.(md|txt) file.

Copy link
Author

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman I tried to point out the two main things I worked on :)

Comment on lines +388 to +428
final perTool = nextflowProcesses
.collect() { process ->
// read in meta.yaml file (nf-core style)
def metaYaml = readMetaYaml(process)
// get ext properties from process
def processorConfig = process.getConfig()
def extProperties = processorConfig.ext as Map
// use either ext property 'name' or 'name' from meta.yaml
def toolNameTask = extProperties.containsKey('name') ? extProperties.get('name') as String : metaYaml.get('name')

def listOfToolMaps = new ArrayList()
metaYaml.get('tools').each { tool -> listOfToolMaps.add( tool as Map ) }

// get descriptions for all tools used in process
// TODO: adapt so that multi-tool-processes get rendered seperately
// TODO: extract more information from meta.yaml
def softwareMap = [:]
def listOfDescriptions = new ArrayList()
listOfToolMaps.each { toolMap ->
toolMap.each { field ->
def fieldMap = field as Map
field.iterator().each {entry ->
entry.iterator().each {entryField ->
def entryFieldMap = entryField.getAt("value") as Map
listOfDescriptions.add(entryFieldMap.getAt("description"))
}

}
}
softwareMap[toolNameTask] = listOfDescriptions
}

def createSoftwareFinal = [
"@id" : toolNameTask,
"@type" : "SoftwareApplication",
"description" : softwareMap.getAt(toolNameTask).toString()
]
return createSoftwareFinal
}

final wfToolDescriptions = perTool.collect()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman This is the new function I wrote. As you can see I did a lot of weird stuff to be able to access the nested yaml file.

Comment on lines +692 to +719
/**
* Read meta.yaml (nf-core style) file for a given Nextflow process.
*
* @param TaskProcessor processor Nextflow process
* @return Yaml as Map
*/
static Map readMetaYaml(TaskProcessor processor) {
WorkflowMetadata workflow = processor.getOwnerScript()?.getBinding()?.getVariable('workflow') as WorkflowMetadata
String projectDir = workflow?.getProjectDir()?.toString()

// TODO: adapt this function to work with non-nf-core yaml files
if (projectDir) {
String moduleName = processor.getName()

// Split the module name to get the tool name (last part)
String[] moduleNameParts = moduleName.split(':')
String toolName = moduleNameParts.length > 0 ? moduleNameParts[-1].toLowerCase() : moduleName.toLowerCase()

// Construct the path to the meta.yml file
Path metaFile = Paths.get(projectDir, 'modules', 'nf-core', toolName, 'meta.yml')

if (Files.exists(metaFile)) {
Yaml yaml = new Yaml()
return yaml.load(metaFile.text) as Map
}
}
return null
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman Maybe the problem is this function. Ideally I would like to read the meta.yaml into different objects divided by the toplevel key (so name, description, tools, input, output). And hand this with the process to the functions in this groovy class. But maybe there is another solution, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you have to do a bit of guessing to figure out where the meta.yml is relative to the main script.

Actually you should be able to get it this way:

ScriptMeta.get(processor.getOwnerScript()).getModuleDir().resolve('meta.yml')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use ScriptMeta:

import nextflow.script.ScriptMeta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants