-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand wrroc with fields needed for federated analysis #37
base: workflow-run-crate
Are you sure you want to change the base?
Expand wrroc with fields needed for federated analysis #37
Conversation
Add workflow file to crate. Correct path to output files in metadata file. Add CreateActions for tool executions. Add ControlActions. Add SoftwareApplications for processes. Add HowToSteps. Add agent information (orcid, name) to nextflow.config that is incorporated into metadata file.
Use outdir from config as publishDir Remove nested list for createAction results Correct controlAction id
Add license information according to Workflow RO-Crate Handle parameters set in workflow Add PropertyValue for FormalParameters Add PropertyValue to object field for CreateAction of workflow Add nextflow_schema.json to outdir if present
* Add objects to CreateActions * Add license entity if license information in nextflow.config is a valid URL * Fix bug: Don't add unpublished files to CreateAction results. Causes NullPointerException, as they aren't contained in workflowOutputs map
* (Hopefully) all input/output files are now copied into the crate * Add Intermediate results ro-crate-metadata.json * Add nextflow.config to ro-crate-metadata.json * Add datePublished to Dataset * valid RO-Crate according to rocrate-validator Signed-off-by: fbartusch <[email protected]>
* Change HowToStep position to string * Remove nextflow.config object from OrganizeAction * add author to root data entity * passes roc-validator with severity REQUIRED Signed-off-by: fbartusch <[email protected]>
agent { | ||
name = "John Doe" | ||
orcid = "https://orcid.org/0000-0000-0000-0000" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can pull this information from the new manifest.contributors
config scope now:
https://www.nextflow.io/docs/latest/reference/config.html#config-manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new output format for the nf-prov plugin creates RO-Crates that follow this profile.
The agent mentioned in the config is not the persons who wrote the workflow, but the person who runs the workflow.
name = "John Doe" | ||
orcid = "https://orcid.org/0000-0000-0000-0000" | ||
} | ||
license = "https://spdx.org/licenses/Apache-2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not pull this information automatically from a LICEN[CS]E.(md|txt)
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new output format for the nf-prov plugin creates RO-Crates that follow this profile.
Similar to your previous review concerning the agent, the license is added directly to the RO-Crate. The crate will include input and output files, the pipeline configuration, and the pipeline itself. I will ensure that the ro-crate-metadata.json
annotates all Workflow components (i.e., everything created by the original workflow author) with the license specified in a LICENSE.(md|txt)
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman I tried to point out the two main things I worked on :)
final perTool = nextflowProcesses | ||
.collect() { process -> | ||
// read in meta.yaml file (nf-core style) | ||
def metaYaml = readMetaYaml(process) | ||
// get ext properties from process | ||
def processorConfig = process.getConfig() | ||
def extProperties = processorConfig.ext as Map | ||
// use either ext property 'name' or 'name' from meta.yaml | ||
def toolNameTask = extProperties.containsKey('name') ? extProperties.get('name') as String : metaYaml.get('name') | ||
|
||
def listOfToolMaps = new ArrayList() | ||
metaYaml.get('tools').each { tool -> listOfToolMaps.add( tool as Map ) } | ||
|
||
// get descriptions for all tools used in process | ||
// TODO: adapt so that multi-tool-processes get rendered seperately | ||
// TODO: extract more information from meta.yaml | ||
def softwareMap = [:] | ||
def listOfDescriptions = new ArrayList() | ||
listOfToolMaps.each { toolMap -> | ||
toolMap.each { field -> | ||
def fieldMap = field as Map | ||
field.iterator().each {entry -> | ||
entry.iterator().each {entryField -> | ||
def entryFieldMap = entryField.getAt("value") as Map | ||
listOfDescriptions.add(entryFieldMap.getAt("description")) | ||
} | ||
|
||
} | ||
} | ||
softwareMap[toolNameTask] = listOfDescriptions | ||
} | ||
|
||
def createSoftwareFinal = [ | ||
"@id" : toolNameTask, | ||
"@type" : "SoftwareApplication", | ||
"description" : softwareMap.getAt(toolNameTask).toString() | ||
] | ||
return createSoftwareFinal | ||
} | ||
|
||
final wfToolDescriptions = perTool.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman This is the new function I wrote. As you can see I did a lot of weird stuff to be able to access the nested yaml file.
/** | ||
* Read meta.yaml (nf-core style) file for a given Nextflow process. | ||
* | ||
* @param TaskProcessor processor Nextflow process | ||
* @return Yaml as Map | ||
*/ | ||
static Map readMetaYaml(TaskProcessor processor) { | ||
WorkflowMetadata workflow = processor.getOwnerScript()?.getBinding()?.getVariable('workflow') as WorkflowMetadata | ||
String projectDir = workflow?.getProjectDir()?.toString() | ||
|
||
// TODO: adapt this function to work with non-nf-core yaml files | ||
if (projectDir) { | ||
String moduleName = processor.getName() | ||
|
||
// Split the module name to get the tool name (last part) | ||
String[] moduleNameParts = moduleName.split(':') | ||
String toolName = moduleNameParts.length > 0 ? moduleNameParts[-1].toLowerCase() : moduleName.toLowerCase() | ||
|
||
// Construct the path to the meta.yml file | ||
Path metaFile = Paths.get(projectDir, 'modules', 'nf-core', toolName, 'meta.yml') | ||
|
||
if (Files.exists(metaFile)) { | ||
Yaml yaml = new Yaml() | ||
return yaml.load(metaFile.text) as Map | ||
} | ||
} | ||
return null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman Maybe the problem is this function. Ideally I would like to read the meta.yaml into different objects divided by the toplevel key (so name, description, tools, input, output). And hand this with the process to the functions in this groovy class. But maybe there is another solution, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you have to do a bit of guessing to figure out where the meta.yml
is relative to the main script.
Actually you should be able to get it this way:
ScriptMeta.get(processor.getOwnerScript()).getModuleDir().resolve('meta.yml')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use ScriptMeta:
import nextflow.script.ScriptMeta
Pull request to reference what we extended.