Skip to content
Young edited this page Dec 26, 2023 · 2 revisions

Output or results

Cecret produces a lot of files. Most of these files are in the format of {params.outdir}/{process}/{resultant files from that process}.

Tree of summary files

Final File Tree after running cecret.nf

cecret
├── aci
│   ├── amplicon_depth.csv
│   ├── amplicon_depth_mqc.png
│   └── amplicon_depth.png
├── aligned
│   ├── SRR13957125.sorted.bam
│   ├── SRR13957125.sorted.bam.bai
│   ├── SRR13957170.sorted.bam
│   ├── SRR13957170.sorted.bam.bai
│   ├── SRR13957177.sorted.bam
│   └── SRR13957177.sorted.bam.bai
├── ampliconclip
│   ├── SRR13957125.primertrim.sorted.bam
│   ├── SRR13957125.primertrim.sorted.bam.bai
│   ├── SRR13957170.primertrim.sorted.bam
│   ├── SRR13957170.primertrim.sorted.bam.bai
│   ├── SRR13957177.primertrim.sorted.bam
│   └── SRR13957177.primertrim.sorted.bam.bai
├── bcftools_variants
│   ├── SRR13957125.bcftools_variants.vcf
│   ├── SRR13957170.bcftools_variants.vcf
│   └── SRR13957177.bcftools_variants.vcf
├── cecret_results.csv
├── cecret_results.txt
├── consensus
│   ├── SRR13957125.consensus.fa
│   ├── SRR13957170.consensus.fa
│   └── SRR13957177.consensus.fa
├── dataset
│   ├── genemap.gff
│   ├── primers.csv
│   ├── qc.json
│   ├── reference.fasta
│   ├── sequences.fasta
│   ├── tag.json
│   ├── tree.json
│   └── virus_properties.json
├── fastp
│   ├── SRR13957125_clean_PE1.fastq.gz
│   ├── SRR13957125_clean_PE2.fastq.gz
│   ├── SRR13957125_fastp.html
│   ├── SRR13957125_fastp.json
│   ├── SRR13957170_clean_PE1.fastq.gz
│   ├── SRR13957170_clean_PE2.fastq.gz
│   ├── SRR13957170_fastp.html
│   ├── SRR13957170_fastp.json
│   ├── SRR13957177_clean_PE1.fastq.gz
│   ├── SRR13957177_clean_PE2.fastq.gz
│   ├── SRR13957177_fastp.html
│   └── SRR13957177_fastp.json
├── fastqc
│   ├── SRR13957125_1_fastqc.html
│   ├── SRR13957125_1_fastqc.zip
│   ├── SRR13957125_2_fastqc.html
│   ├── SRR13957125_2_fastqc.zip
│   ├── SRR13957125_fastq_name.csv
│   ├── SRR13957170_1_fastqc.html
│   ├── SRR13957170_1_fastqc.zip
│   ├── SRR13957170_2_fastqc.html
│   ├── SRR13957170_2_fastqc.zip
│   ├── SRR13957170_fastq_name.csv
│   ├── SRR13957177_1_fastqc.html
│   ├── SRR13957177_1_fastqc.zip
│   ├── SRR13957177_2_fastqc.html
│   ├── SRR13957177_2_fastqc.zip
│   └── SRR13957177_fastq_name.csv
├── freyja
│   ├── aggregated-freyja.png
│   ├── aggregated-freyja.tsv
│   ├── SRR13957125_demix_collapsed_lineages.yml
│   ├── SRR13957125_demix.tsv
│   ├── SRR13957125_depths.tsv
│   ├── SRR13957125_freyja_lineages_mqc.png
│   ├── SRR13957125_freyja_lineages.png
│   ├── SRR13957125_variants.tsv
│   ├── SRR13957170_depths.tsv
│   ├── SRR13957170_variants.tsv
│   ├── SRR13957177_demix_collapsed_lineages.yml
│   ├── SRR13957177_demix.tsv
│   ├── SRR13957177_depths.tsv
│   ├── SRR13957177_freyja_lineages_mqc.png
│   ├── SRR13957177_freyja_lineages.png
│   └── SRR13957177_variants.tsv
├── heatcluster
│   ├── sorted_matrix.csv
│   ├── heatcluster_mqc.png
│   └── heatcluster.png
├── igv_reports
│   ├── SRR13957125_igvjs_viewer.html
│   ├── SRR13957170_igvjs_viewer.html
│   └── SRR13957177_igvjs_viewer.html
├── iqtree2
│   ├── iqtree2.iqtree
│   ├── iqtree2.log
│   ├── iqtree2.mldist
│   ├── iqtree2.treefile
│   └── iqtree2.treefile.nwk
├── ivar_consensus
│   ├── SRR13957125.consensus.fa
│   ├── SRR13957125.consensus.qual.txt
│   ├── SRR13957170.consensus.fa
│   ├── SRR13957170.consensus.qual.txt
│   ├── SRR13957177.consensus.fa
│   └── SRR13957177.consensus.qual.txt
├── ivar_trim
│   ├── SRR13957125_ivar.log
│   ├── SRR13957125.primertrim.sorted.bam
│   ├── SRR13957125.primertrim.sorted.bam.bai
│   ├── SRR13957170_ivar.log
│   ├── SRR13957170.primertrim.sorted.bam
│   ├── SRR13957170.primertrim.sorted.bam.bai
│   ├── SRR13957177_ivar.log
│   ├── SRR13957177.primertrim.sorted.bam
│   └── SRR13957177.primertrim.sorted.bam.bai
├── ivar_variants
│   ├── SRR13957125.ivar_variants.vcf
│   ├── SRR13957125.variants.tsv
│   ├── SRR13957170.ivar_variants.vcf
│   ├── SRR13957170.variants.tsv
│   ├── SRR13957177.ivar_variants.vcf
│   └── SRR13957177.variants.tsv
├── logs
│   └── <Log files for processes not included in tree for brevity>
├── mafft
│   └── mafft_aligned.fasta
├── multiqc
│   ├── multiqc_data
│   │   ├── multiqc_citations.txt
│   │   ├── multiqc_data.json
│   │   ├── multiqc_fastqc.txt
│   │   ├── multiqc_general_stats.txt
│   │   ├── multiqc.log
│   │   ├── multiqc_nextclade.txt
│   │   ├── multiqc_pangolin.txt
│   │   ├── multiqc_samtools_flagstat.txt
│   │   ├── multiqc_samtools_stats.txt
│   │   ├── multiqc_seqyclean.txt
│   │   ├── multiqc_software_versions.txt
│   │   └── multiqc_sources.txt
│   └── multiqc_report.html
├── nextalign
│   ├── nextalign.aligned.fasta
│   ├── nextalign.errors.csv
│   ├── nextalign_gene_E.translation.fasta
│   ├── nextalign_gene_M.translation.fasta
│   ├── nextalign_gene_N.translation.fasta
│   ├── nextalign_gene_ORF1a.translation.fasta
│   ├── nextalign_gene_ORF1b.translation.fasta
│   ├── nextalign_gene_ORF3a.translation.fasta
│   ├── nextalign_gene_ORF6.translation.fasta
│   ├── nextalign_gene_ORF7a.translation.fasta
│   ├── nextalign_gene_ORF7b.translation.fasta
│   ├── nextalign_gene_ORF8.translation.fasta
│   ├── nextalign_gene_ORF9b.translation.fasta
│   ├── nextalign_gene_S.translation.fasta
│   ├── nextalign.insertions.csv
│   └── ultimate.fasta
├── nextclade
│   ├── combined.fasta
│   ├── nextclade.aligned.fasta
│   ├── nextclade.auspice.json
│   ├── nextclade.csv
│   ├── nextclade.errors.csv
│   ├── nextclade_gene_E.translation.fasta
│   ├── nextclade_gene_M.translation.fasta
│   ├── nextclade_gene_N.translation.fasta
│   ├── nextclade_gene_ORF1a.translation.fasta
│   ├── nextclade_gene_ORF1b.translation.fasta
│   ├── nextclade_gene_ORF3a.translation.fasta
│   ├── nextclade_gene_ORF6.translation.fasta
│   ├── nextclade_gene_ORF7a.translation.fasta
│   ├── nextclade_gene_ORF7b.translation.fasta
│   ├── nextclade_gene_ORF8.translation.fasta
│   ├── nextclade_gene_ORF9b.translation.fasta
│   ├── nextclade_gene_S.translation.fasta
│   ├── nextclade.insertions.csv
│   ├── nextclade.json
│   ├── nextclade.ndjson
│   └── nextclade.tsv
├── pango_collapse
│   └── pango_collapse.csv
├── pangolin
│   ├── combined.fasta
│   └── lineage_report.csv
├── phytreeviz
│   ├── tree_mqc.png
│   └── tree.png
├── samtools_ampliconstats
│   ├── SRR13957125_ampliconstats.txt
│   ├── SRR13957170_ampliconstats.txt
│   └── SRR13957177_ampliconstats.txt
├── samtools_coverage
│   ├── samtools_coverage_summary.tsv
│   ├── SRR13957125.cov.hist
│   ├── SRR13957125.cov.txt
│   ├── SRR13957170.cov.hist
│   ├── SRR13957170.cov.txt
│   ├── SRR13957177.cov.hist
│   └── SRR13957177.cov.txt
├── samtools_depth
│   ├── SRR13957125.depth.txt
│   ├── SRR13957170.depth.txt
│   └── SRR13957177.depth.txt
├── samtools_flagstat
│   ├── SRR13957125.flagstat.txt
│   ├── SRR13957170.flagstat.txt
│   └── SRR13957177.flagstat.txt
├── samtools_plot_ampliconstats
│   ├── SRR13957125
│   ├── SRR13957125-combined-amp.gp
│   ├── SRR13957125-combined-amp.png
│   ├── SRR13957125-combined-coverage-1.gp
│   ├── SRR13957125-combined-coverage-1.png
│   ├── SRR13957125-combined-depth.gp
│   ├── SRR13957125-combined-depth.png
│   ├── SRR13957125-combined-read-perc.gp
│   ├── SRR13957125-combined-read-perc.png
│   ├── SRR13957125-combined-reads.gp
│   ├── SRR13957125-combined-reads.png
│   ├── SRR13957125-combined-tcoord.gp
│   ├── SRR13957125-combined-tcoord.png
│   ├── SRR13957125-combined-tdepth.gp
│   ├── SRR13957125-combined-tdepth.png
│   ├── SRR13957125-heat-amp-1.gp
│   ├── SRR13957125-heat-amp-1.png
│   ├── SRR13957125-heat-coverage-1-1.gp
│   ├── SRR13957125-heat-coverage-1-1.png
│   ├── SRR13957125-heat-read-perc-1.gp
│   ├── SRR13957125-heat-read-perc-1.png
│   ├── SRR13957125-heat-read-perc-log-1.gp
│   ├── SRR13957125-heat-read-perc-log-1.png
│   ├── SRR13957125-heat-reads-1.gp
│   ├── SRR13957125-heat-reads-1.png
│   ├── SRR13957125-SRR13957125.primertrim.sorted-amp.gp
│   ├── SRR13957125-SRR13957125.primertrim.sorted-amp.png
│   ├── SRR13957125-SRR13957125.primertrim.sorted-cov.gp
│   ├── SRR13957125-SRR13957125.primertrim.sorted-cov.png
│   ├── SRR13957125-SRR13957125.primertrim.sorted-reads.gp
│   ├── SRR13957125-SRR13957125.primertrim.sorted-reads.png
│   ├── SRR13957125-SRR13957125.primertrim.sorted-tcoord.gp
│   ├── SRR13957125-SRR13957125.primertrim.sorted-tcoord.png
│   ├── SRR13957125-SRR13957125.primertrim.sorted-tdepth.gp
│   ├── SRR13957125-SRR13957125.primertrim.sorted-tdepth.png
│   ├── SRR13957125-SRR13957125.primertrim.sorted-tsize.gp
│   ├── SRR13957125-SRR13957125.primertrim.sorted-tsize.png
│   ├── SRR13957170
│   ├── SRR13957170-combined-amp.gp
│   ├── SRR13957170-combined-amp.png
│   ├── SRR13957170-combined-coverage-1.gp
│   ├── SRR13957170-combined-coverage-1.png
│   ├── SRR13957170-combined-depth.gp
│   ├── SRR13957170-combined-depth.png
│   ├── SRR13957170-combined-read-perc.gp
│   ├── SRR13957170-combined-read-perc.png
│   ├── SRR13957170-combined-reads.gp
│   ├── SRR13957170-combined-reads.png
│   ├── SRR13957170-combined-tdepth.gp
│   ├── SRR13957170-combined-tdepth.png
│   ├── SRR13957170-heat-amp-1.gp
│   ├── SRR13957170-heat-amp-1.png
│   ├── SRR13957170-heat-coverage-1-1.gp
│   ├── SRR13957170-heat-coverage-1-1.png
│   ├── SRR13957170-heat-read-perc-1.gp
│   ├── SRR13957170-heat-read-perc-1.png
│   ├── SRR13957170-heat-read-perc-log-1.gp
│   ├── SRR13957170-heat-read-perc-log-1.png
│   ├── SRR13957170-heat-reads-1.gp
│   ├── SRR13957170-heat-reads-1.png
│   ├── SRR13957170-SRR13957170.primertrim.sorted-amp.gp
│   ├── SRR13957170-SRR13957170.primertrim.sorted-amp.png
│   ├── SRR13957170-SRR13957170.primertrim.sorted-cov.gp
│   ├── SRR13957170-SRR13957170.primertrim.sorted-cov.png
│   ├── SRR13957170-SRR13957170.primertrim.sorted-reads.gp
│   ├── SRR13957170-SRR13957170.primertrim.sorted-reads.png
│   ├── SRR13957170-SRR13957170.primertrim.sorted-tdepth.gp
│   ├── SRR13957170-SRR13957170.primertrim.sorted-tdepth.png
│   ├── SRR13957177
│   ├── SRR13957177-combined-amp.gp
│   ├── SRR13957177-combined-amp.png
│   ├── SRR13957177-combined-coverage-1.gp
│   ├── SRR13957177-combined-coverage-1.png
│   ├── SRR13957177-combined-depth.gp
│   ├── SRR13957177-combined-depth.png
│   ├── SRR13957177-combined-read-perc.gp
│   ├── SRR13957177-combined-read-perc.png
│   ├── SRR13957177-combined-reads.gp
│   ├── SRR13957177-combined-reads.png
│   ├── SRR13957177-combined-tcoord.gp
│   ├── SRR13957177-combined-tcoord.png
│   ├── SRR13957177-combined-tdepth.gp
│   ├── SRR13957177-combined-tdepth.png
│   ├── SRR13957177-heat-amp-1.gp
│   ├── SRR13957177-heat-amp-1.png
│   ├── SRR13957177-heat-coverage-1-1.gp
│   ├── SRR13957177-heat-coverage-1-1.png
│   ├── SRR13957177-heat-read-perc-1.gp
│   ├── SRR13957177-heat-read-perc-1.png
│   ├── SRR13957177-heat-read-perc-log-1.gp
│   ├── SRR13957177-heat-read-perc-log-1.png
│   ├── SRR13957177-heat-reads-1.gp
│   ├── SRR13957177-heat-reads-1.png
│   ├── SRR13957177-SRR13957177.primertrim.sorted-amp.gp
│   ├── SRR13957177-SRR13957177.primertrim.sorted-amp.png
│   ├── SRR13957177-SRR13957177.primertrim.sorted-cov.gp
│   ├── SRR13957177-SRR13957177.primertrim.sorted-cov.png
│   ├── SRR13957177-SRR13957177.primertrim.sorted-reads.gp
│   ├── SRR13957177-SRR13957177.primertrim.sorted-reads.png
│   ├── SRR13957177-SRR13957177.primertrim.sorted-tcoord.gp
│   ├── SRR13957177-SRR13957177.primertrim.sorted-tcoord.png
│   ├── SRR13957177-SRR13957177.primertrim.sorted-tdepth.gp
│   ├── SRR13957177-SRR13957177.primertrim.sorted-tdepth.png
│   ├── SRR13957177-SRR13957177.primertrim.sorted-tsize.gp
│   └── SRR13957177-SRR13957177.primertrim.sorted-tsize.png
├── samtools_stats
│   ├── 2249693-IA-M05216-230323.stats.txt
│   ├── SRR13957125.stats.txt
│   ├── SRR13957170.stats.txt
│   └── SRR13957177.stats.txt
├── seqyclean
│   ├── 2249693-IA-M05216-230323_clean_PE1.fastq.gz
│   ├── 2249693-IA-M05216-230323_clean_PE2.fastq.gz
│   ├── 2249693-IA-M05216-230323_clean_SummaryStatistics.tsv
│   ├── Combined_SummaryStatistics.tsv
│   ├── SRR13957125_clean_PE1.fastq.gz
│   ├── SRR13957125_clean_PE2.fastq.gz
│   ├── SRR13957125_clean_SummaryStatistics.tsv
│   ├── SRR13957170_clean_PE1.fastq.gz
│   ├── SRR13957170_clean_PE2.fastq.gz
│   ├── SRR13957170_clean_SummaryStatistics.tsv
│   ├── SRR13957177_clean_PE1.fastq.gz
│   ├── SRR13957177_clean_PE2.fastq.gz
│   └── SRR13957177_clean_SummaryStatistics.tsv
├── snp-dists
│   └── snp-dists.txt
└── vadr
    ├── combined.fasta
    ├── trimmed.fasta
    ├── vadr.vadr.alc
    ├── vadr.vadr.alt
    ├── vadr.vadr.alt.list
    ├── vadr.vadr.cmd
    ├── vadr.vadr.dcr
    ├── vadr.vadr.fail.fa
    ├── vadr.vadr.fail.list
    ├── vadr.vadr.fail.tbl
    ├── vadr.vadr.filelist
    ├── vadr.vadr.ftr
    ├── vadr.vadr.log
    ├── vadr.vadr.mdl
    ├── vadr.vadr.pass.fa
    ├── vadr.vadr.pass.list
    ├── vadr.vadr.pass.tbl
    ├── vadr.vadr.rpn
    ├── vadr.vadr.sda
    ├── vadr.vadr.seqstat
    ├── vadr.vadr.sgm
    ├── vadr.vadr.sqa
    └── vadr.vadr.sqc

Final files

There are two main files that summarize the information from this workflow. One is a csv file with the key result from each process, and one is the multiqc report. The default location for these files is cecret/cecret_results.csv and cecret/multiqc/multiqc_report.html. If using an alternative destination, set params.outdir to your preferred destination (see: instructions on how to adjust parameters).

Summary csv file

There are summary files for each run found at cecret/cecret_results.csv and cecret/cecret_results.txt. These two files are exactly the same except for the delimiter used to separate the columns.

An example file run with the default values for some SARS-CoV-2 fastq files SRR13957125, SRR13957170, and SRR13957177 are below.

sample_id sample pangolin_lineage nextclade_clade vadr_p/f fasta_line fastqc_raw_reads_1 fastqc_raw_reads_2 num_N num_total seqyclean_PairsKept seqyclean_Perc_Kept num_pos_100X aci_num_failed_amplicons insert_size_after_trimming ivar_num_variants_identified bcftools_variants_identified samtools_meandepth_after_trimming samtools_per_1X_coverage_after_trimming vadr_model vadr_alerts nextclade_clade_who nextclade_qc_overallscore nextclade_qc_overallstatus pangolin_conflict pangolin_ambiguity_score pangolin_scorpio_call pangolin_scorpio_support pangolin_scorpio_conflict pangolin_scorpio_notes pangolin_version pangolin_pangolin_version pangolin_scorpio_version pangolin_constellation_version pangolin_is_designated pangolin_qc_status pangolin_qc_notes pangolin_note pangocollapse_lineage pangocollapse_Lineage_full pangocollapse_Lineage_expanded pangocollapse_Lineage_family freyja_summarized Cecret version seqyclean bwa ivar ivar consensus
SRR13957125 SRR13957125 B.1.429 21C PASS SRR13957125 670879.0 670879.0 667.0 29875.0 576244.0 85.8939 29200.0 2.0 199.0 27.0 27.0 5401.08 99.6522 NC_045512 - Epsilon 12.958697 good 0.0 Epsilon (B.1.429-like) 1.0 0.0 scorpio call: Alt alleles 14; Ref alleles 0; Amb alleles 0; Oth alleles 0 PUSHER-v1.23.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 4% Usher placements: B.1.429(1/1) B.1.429 B.1.429 B.1.429 B [('Epsilon' 0.9990499999965543)] v3.10.20231226 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
SRR13957170 SRR13957170 Unassigned SRR13957170 2287.0 2287.0 25545.0 25545.0 176.0 7.69567 0.0 99.0 160.0 0.0 6.0 0.182791 6.91235 PUSHER-v1.23.1 4.3.1 0.3.19 v0.1.12 False fail Failed to map Unassigned Unassigned Unassigned Unassigned v3.10.20231226 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2
SRR13957177 SRR13957177 B.1.1.7 20I PASS SRR13957177 902426.0 902426.0 776.0 29787.0 837318.0 92.7852 29019.0 2.0 207.3 39.0 41.0 7621.74 99.8495 NC_045512 - Alpha 5.885816 good 0.0 Alpha (B.1.1.7-like) 0.96 0.04 scorpio call: Alt alleles 22; Ref alleles 1; Amb alleles 0; Oth alleles 0 PUSHER-v1.23.1 4.3.1 0.3.19 v0.1.12 False pass Ambiguous content: 4% Usher placements: B.1.1.7(1/1) B.1.1.7 B.1.1.7 B.1.1.7 B.1.1.7 [('Alpha' 0.999009112161133)] v3.10.20231226 seqyclean : Version: 1.10.09 (2018-10-16) bwa : Version: 0.7.17-r1188 ivar : iVar version 1.4.2 iVar version 1.4.2

MultiQC report

The multiqc report aggregates data across your samples into one file. Open the 'cecret/multiqc/multiqc_report.html' file with your favored browser. There tables and graphs are generated for 'General Statistics', 'Samtools stats', 'Samtools flagstats', 'FastQC', 'iVar', 'SeqyClean', 'Fastp', 'Pangolin', and 'Kraken2'. There are also added custom sections for many additional analysis including Freyja and PhyTreeViz.

Example fastqc graph

Example kraken2 graph

Example iVar graph

Example pangolin graph

Process - level information

Sometimes these summary files are not sufficient. This is an expected use-case, which is why there are so many more files in the workflow.

More information can be found in other pages in this wiki.

Mutually exclusive directories

Although in the tree above, not every directory or file is intended for every run. In fact, there are several mutually exclusive processes:

  • seqyclean and fastp
  • ivar_trim and ampliconclip (neither will appear if params.trimmer = 'none')
  • mafft and nextalign

Please note that any files or directories related to phylogenetic analysis will not be run or appear in results unless the relatedness parameter is set to true (params.relatedness = true). A directory may still appear if a process is "turned off," but this directory will be empty.

Nextflow output

Nextflow will also produce files. These files are more about the resources that your system used than about analysis of the data.

nf-*-reports.tsv

If using Nextflow Tower or Seqera Platform, these are the files that appear in the reports section of the UI.

Currently, the summary file for the workflow (cecret_results.*) and the multiqc report (multiqc_report.html) are included. More may be added once UPHL gains access to this resource.

report-*-*.html

This is a report of the workflow that may be useful in determining computational resource use. This includes the command that was used, the directories and other storage locations that were used, and information about the processes being run.

timeline-*-*.html

Visually shows the timeline for each process given the computational environment.

work

A directory for all the temporary files used in the analysis. Users can generally delete this directory when the workflow is complete because 1) it generally is not needed if everything is run successfully and 2) because it is VERY large. Users will not be able to use nextflow -resume if this directory is deleted.