Skip to content

Latest commit

 

History

History
91 lines (59 loc) · 3.94 KB

output.md

File metadata and controls

91 lines (59 loc) · 3.94 KB

birneylab/stitchimpute: Output

Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

According to the pipeline workflow selected, the output folder with have different subfolders, which I indicate as {group} in the following explanation. {group} will be nothing in the imputation workflow. In the grid search workflow, it will be a string K{K value}_nGen{nGen value} with the corresponding value of K and nGen for a given combination of parameters. In the SNP set refinement workflow, it will be a string iteration\_{n} with the corresponding iteration number.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  • Stitch - Raw output from the STITCH imputation per chromosome
  • Joint output - Full-genome imputation output
  • Imputation metrics - Imputation metrics per SNP, MAF bins, and/or samples
  • Plots - Plots obtained from the imputation metrics
  • Pipeline information - Report metrics generated during the workflow execution

Stitch

Raw output from the STITCH imputation per chromosome. Read about STITCH to know more about the various files produced.

Output files
  • {group}/stitch/chromosome_*
    • plots/: Plots produced by STITCH
    • RData/: Intermediate STITCH results as R objects
    • chromosome_*.vcf.gz: Imputed VCF file for the chromosome
    • chromosome_*.vcf.gz.csi: Index file for the VCF

Joint output

Full-genome imputation output

Output files
  • {group}/joint_stitch_output
    • vcf/joint_stitch_output.vcf.gz: Full genome imputed genotypes
    • vcf/joint_stitch_output.vcf.gz.csi: VCF index

Imputation metrics

Imputation metrics per SNP, minor allele frequency (MAF) bins, and/or samples. If a ground truth is provided this contains also the output of glimpse2/concordance.

Output files
  • {group}/imputation_metrics
    • joint_stitch_output.info_score.csv.gz: CSV file with header and columns chr,pos,ref,alt,info_score. The info_score is extracted from the STITCH output and it is a SNP-wise internal imputation quality metric
    • joint_stitch_output.r2_sites.tsv.gz: TSV file produced by glimpse2/concordance with per-SNP ground truth correlations in terms of allele dosages (ds_r2)
    • joint_stitch_output.{rsquare,error}.{grp,spl}.txt.gz: ground truth performance metrics produced by glimpse2/concordance

Plots

Plots produced from the files in the imputation_metrics folder

Output files
  • {group}/plots
    • joint_stitch_output.{info_score,r2_sites,r2_samples,r2_maf_bins}.pdf

Pipeline information

Output files
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter's are used when running the pipeline.
    • Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.