-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve WrrocRenderer #33
base: workflow-run-crate
Are you sure you want to change the base?
Improve WrrocRenderer #33
Conversation
Add workflow file to crate. Correct path to output files in metadata file. Add CreateActions for tool executions. Add ControlActions. Add SoftwareApplications for processes. Add HowToSteps. Add agent information (orcid, name) to nextflow.config that is incorporated into metadata file.
I've installed the plugin with plugins {
id '[email protected]'
}
params {
outdir = 'results'
}
prov {
enabled = true
formats {
wrroc {
file = "${params.outdir}/ro-crate-metadata.json"
overwrite = true
agent {
name = "John Doe"
orcid = "https://orcid.org/0000-0000-0000-0000"
}
}
}
} I had to pin the plugin's version because otherwise Nextflow downloads
"result": [
[
{
"@id": "r1.foo.1.txt"
},
{
"@id": "r1.foo.2.txt"
}
]
], should be: "result": [
{
"@id": "r1.foo.1.txt"
},
{
"@id": "r1.foo.2.txt"
}
], I made these changes manually to move on with the review. At this point, Runcrate reads the crate (
I've also noticed a weird thing: if you run |
Thanks for the comprehensive feedback @simleo . The next time I will check if runcrate can parse the document before asking for feedback ... I fixed most of the issues, but this causes some headache:
I can reproduce the problem. Next I will use the newest Nextflow version and maybe the newest nf-prov version, maybe this problem is addressed there. For me it doesn't seem that the problem originates from the WrrocRenderer itself. |
Use outdir from config as publishDir Remove nested list for createAction results Correct controlAction id
I've run the workflow again with the plugin at ac05da2, using the same
{
"@id": "#outdir",
"@type": "FormalParameter",
"name": "outdir",
...
},
{
"@id": "#constant",
"@type": "FormalParameter",
"name": "constant",
...
},
{
"@id": "#b5236af9-95e9-42b8-9121-a6a57755ee1b",
"@type": "CreateAction",
"instrument": {
"@id": "test.nf"
},
"object": [
{"@id": "#outdir-pv"},
{"@id": "#constant-pv"}
],
"result": [
{"@id": "r3.bar.2.txt"},
{"@id": "r3.bar.1.txt"},
{"@id": "r2.bar.1.txt"},
{"@id": "r2.bar.2.txt"},
{"@id": "r1.bar.1.txt"},
{"@id": "r1.bar.2.txt"}
]
...
},
{
"@id": "#outdir-pv",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#outdir"},
"name": "outdir",
"value": "crate"
},
{
"@id": "#constant-pv",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#constant"},
"name": "constant",
"value": "bar"
} When run with
|
Add license information according to Workflow RO-Crate Handle parameters set in workflow Add PropertyValue for FormalParameters Add PropertyValue to object field for CreateAction of workflow Add nextflow_schema.json to outdir if present
First, thank you for the review @simleo.
|
|
I've tested the latest version: big improvements! As reported above, the main thing that's missing now is completing the information related to the
|
If a {
"@id": "./",
"@type": "Dataset",
"license": {
"@id": "https://spdx.org/licenses/Apache-2.0"
},
...
},
{
"@id": "https://spdx.org/licenses/Apache-2.0",
"@type": "CreativeWork"
} |
* Add objects to CreateActions * Add license entity if license information in nextflow.config is a valid URL * Fix bug: Don't add unpublished files to CreateAction results. Causes NullPointerException, as they aren't contained in workflowOutputs map
* (Hopefully) all input/output files are now copied into the crate * Add Intermediate results ro-crate-metadata.json * Add nextflow.config to ro-crate-metadata.json * Add datePublished to Dataset * valid RO-Crate according to rocrate-validator Signed-off-by: fbartusch <[email protected]>
* Change HowToStep position to string * Remove nextflow.config object from OrganizeAction * add author to root data entity * passes roc-validator with severity REQUIRED Signed-off-by: fbartusch <[email protected]>
I tested the created RO-Crate with the new 0.4.0 version of
I ran |
Hold on, @fbartusch, there seems to be a problem with the JSON output format. It looks like it's reporting as |
The issue has been fixed in the latest release, version 0.4.2 |
final propertyValues = params | ||
.collect { name, value -> | ||
[ | ||
"@id" : "#${name}-pv", | ||
"@type" : "PropertyValue", | ||
"exampleOfWork": ["@id": "#${name}"], | ||
"name" : name, | ||
"value" : value | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During our Hackathon the following issue came up:
Most nf-core pipelines deal with reference genome options (mainly igenomes). The way the #genome-pv section of the final ro-crate json is created leads to the whole rendered igenomes.config (example here) to be added as value which makes "value" : {} nested and thus leads to an invalid crate. For our minimal example we just removed the reference genome option which then led to a non-nested crate again.
Our idea was to catch the genomes-pv case with an if statement, then copy the rendered igenomes.config to the results directory and point the crate value entry to that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for identifying this problem. It didn’t occur during my tests with the demo pipeline. Catching the genomes-pv
case might help for nf-core workflows that use iGenomes, but it doesn’t address the underlying issue. Off the top of my head, I’d suggest putting the entire content into one big string and storing it as a value
. I’ll try that this afternoon and check if the validator accepts it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed the igenomes reference from our small test pipeline for now. But there has to be a more elegant way :) I also tried some stuff on top of what you did and expanded the ro-crate (see #37).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran rnaseq 3.14.0 on our cluster (where we have igenomes) and validated ro-crate-metadata.json
with the newest validator 0.4.5. The validator does not complain about the value
in the #genome-pv
section.
I pasted the section in a JSON validator and it's also valid JSON.
Can you upload your problematic RO-Crate somewhere and tell me which rocrate-validator version you used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mburridge96 I think you have the better overview about the issue with the nested crates. Maybe you can explain :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whilst it's valid JSON, the RO-Crate spec requires every object within the @graph array to be flat. So every object that's in the array must have an "@id", "@type" and then any additional key:value pairs. If nested data is required, you then point to another object that describes it.
So below is valid:
"@graph" : [{
"@id" : "URI/#unique_local",
"@type": "schema_type",
"name": "example",
"description": "description of example",
"about": {"@id": "linktoabout"},
}, {
"@id": "linktoabout",
"@type": "schema_type"
}]
but then the following is not:
"@graph" : [{
"@id" : "URI/#unique_local",
"@type": "schema_type",
"name": "example",
"description": "description of example",
"about": {
"@id": "linktoabout",
"@type": "schema_type"
}}]
The issue is if we dump everything in the value key it won't conform to the RO-Crate spec and I don't think (could be wrong) that the validator yet checks for that yet which is another problem in itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@famosab & @mburridge96 I think I fixed this in 4d22b84.
I'm checking if the value
is a Groovy List or Map and save it as String instead of nested JSON.
It works with the latest validator.
* Check for nested PropertyValues * If nested, serialize it to JSON and write the serialization as one string into the PropertyValue value field. * current rocrate-validator (0.4.6) is happy with this solution :) * Add organization to ro-crate-metadata.json and to agent Signed-off-by: fbartusch <[email protected]>
* Add publisher option to configuration * Add the publisher to the ro-crate-metadata.json if the ID corresponds to an organization or agent Signed-off-by: fbartusch <[email protected]>
Add workflow file to crate.
Correct path to output files in metadata file.
Add CreateActions for tool executions.
Add ControlActions.
Add SoftwareApplications for processes.
Add HowToSteps.
Add agent information (orcid, name) to nextflow.config that is incorporated into metadata file.