-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for output data of a pipeline #4
Comments
Idea for defining output filesOne idea for defining output files listed in the |
1.1. Output files (comment)One comment on output files, some pipelines may want output files associated with other divisions than samples. For example, wg/cgMLST clustering may want to associate output clusters (a tree and other information) with each MLST scheme passed to the pipeline. As in this pipeline phac-nml/nf-pipelines#1, which breaks up analysis results by MLST profile. One option here is to just reserve the words "summary" and "samples", but provide any other divisions here. For example: {
"files": {
"summary": { ... },
"samples": { ... },
"mlst_profiles": {
"listeria_cgmlst": {
"clusters": "clusters.text.gz"
}
}
}
} The clusters output file for this particular analysis pipeline could then be accessed by |
Added this in #7 |
1. Output JSON file
In order for pipeline results to be loaded by IRIDA Next, an
output.json
(oroutput.json.gz
) file should be produced that informs IRIDA Next of which data/metadata to store.This file can be structured like the following:
1.1. Output files
In order to store output files within IRIDA Next, they should be listed in the
files
section as key/value pairs using the following structure.The
summary
keyword lists output files related to all samples/data in the pipeline (e.g., a phylogenetic tree). Thesamples
keyword lists output files associated with a particular sample (e.g., an assembled genome, etc).Within each of these sections, there are key-value pairs which will allow access of files for an analysis by the key (e.g.,
SampleA.assembly_contigs
returnsoutput_file.fasta.gz
).1.2. Sample metadata
In order to store sample metadata, it should be structured by the pipeline like the following:
2. Storage of data in IRIDA Next
2.1. Metadata
Metadata will be stored in IRIDA Next by loading the
output.json
file and looking for themetadata.samples
section. It will store the information associated with each respective sample (e.g., the "SampleA" part below will be used to lookup "SampleA" in IRIDA Next, and the contents of the JSON dictionary will be merged with any existing metadata for SampleA).There will be a parallel table which stores metadata about the source of each above field:
The text was updated successfully, but these errors were encountered: