This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
According to the pipeline workflow selected, the output folder with have different subfolders, which I indicate as {group}
in the following explanation.
{group}
will be nothing in the imputation workflow.
In the grid search workflow, it will be a string K{K value}_nGen{nGen value}
with the corresponding value of K and nGen for a given combination of parameters.
In the SNP set refinement workflow, it will be a string iteration\_{n}
with the corresponding iteration number.
The pipeline is built using Nextflow and processes data using the following steps:
- Stitch - Raw output from the STITCH imputation per chromosome
- Joint output - Full-genome imputation output
- Imputation metrics - Imputation metrics per SNP, MAF bins, and/or samples
- Plots - Plots obtained from the imputation metrics
- Pipeline information - Report metrics generated during the workflow execution
Raw output from the STITCH imputation per chromosome. Read about STITCH to know more about the various files produced.
Output files
{group}/stitch/chromosome_*
plots/
: Plots produced by STITCHRData/
: Intermediate STITCH results as R objectschromosome_*.vcf.gz
: Imputed VCF file for the chromosomechromosome_*.vcf.gz.csi
: Index file for the VCF
Full-genome imputation output
Output files
{group}/joint_stitch_output
vcf/joint_stitch_output.vcf.gz
: Full genome imputed genotypesvcf/joint_stitch_output.vcf.gz.csi
: VCF index
Imputation metrics per SNP, minor allele frequency (MAF) bins, and/or samples. If a ground truth is provided this contains also the output of glimpse2/concordance
.
Output files
{group}/imputation_metrics
joint_stitch_output.info_score.csv.gz
: CSV file with header and columnschr,pos,ref,alt,info_score
. Theinfo_score
is extracted from the STITCH output and it is a SNP-wise internal imputation quality metricjoint_stitch_output.r2_sites.tsv.gz
: TSV file produced byglimpse2/concordance
with per-SNP ground truth correlations in terms of allele dosages (ds_r2
)joint_stitch_output.{rsquare,error}.{grp,spl}.txt.gz
: ground truth performance metrics produced byglimpse2/concordance
Plots produced from the files in the imputation_metrics
folder
Output files
{group}/plots
joint_stitch_output.{info_score,r2_sites,r2_samples,r2_maf_bins}.pdf
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.