Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganise arguments for clearer syntax #1091

Merged
merged 3 commits into from
Oct 11, 2023

Conversation

adamrtalbot
Copy link
Contributor

@adamrtalbot adamrtalbot commented Oct 10, 2023

Changes:

  • Grouped arguments into sections based on what they do
  • Reordered slightly to go in chronological order of the pipeline and set up
  • I think it's clearer for a new user
  • Note no actual text has changed, just the order. Git thinks this is a full re-write.

Fixes #1090

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Changes:
 - Grouped arguments into sections based on _what they do_
 - Reordered slightly to go in chronological order of the pipeline and set up
 - I think it's clearer for a new user

Fixes nf-core#1090
@github-actions
Copy link

github-actions bot commented Oct 10, 2023

nf-core lint overall result: Failed ❌

Posted for pipeline commit dd45b04

+| ✅ 141 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   3 tests had warnings |!
-| ❌   3 tests failed       |-

❌ Test failures:

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
  • multiqc_config - multiqc_config

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-10-11 19:17:23

@adamrtalbot
Copy link
Contributor Author

adamrtalbot commented Oct 10, 2023

Blocked until #1078 or #1088 is merged.

@adamrtalbot
Copy link
Contributor Author

Here's the new docs from --help:

nextflow run . -profile test,docker  --help --validationShowHiddenParams
N E X T F L O W  ~  version 23.09.2-edge
Launching `./main.nf` [naughty_joliot] DSL2 - revision: 37b7a7de80


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/rnaseq v3.13.0dev
------------------------------------------------------
Typical pipeline command:

  nextflow run nf-core/rnaseq --input samplesheet.csv --genome GRCh37 -profile docker

Input/output options
  --input                            [string]  Path to comma-separated file containing information about the samples in the experiment.
  --outdir                           [string]  The output directory where the results will be saved. You have to use absolute paths to storage on Cloud 
                                               infrastructure. 
  --email                            [string]  Email address for completion summary.
  --multiqc_title                    [string]  MultiQC report title. Printed as page header, used for filename if not otherwise specified.

Reference genome options
  --genome                           [string]  Name of iGenomes reference.
  --fasta                            [string]  Path to FASTA genome file.
  --gtf                              [string]  Path to GTF annotation file.
  --gff                              [string]  Path to GFF3 annotation file.
  --gene_bed                         [string]  Path to BED file containing gene intervals. This will be created from the GTF file if not specified.
  --transcript_fasta                 [string]  Path to FASTA transcriptome file.
  --additional_fasta                 [string]  FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences.
  --splicesites                      [string]  Splice sites file required for HISAT2.
  --star_index                       [string]  Path to directory or tar.gz archive for pre-built STAR index.
  --hisat2_index                     [string]  Path to directory or tar.gz archive for pre-built HISAT2 index.
  --rsem_index                       [string]  Path to directory or tar.gz archive for pre-built RSEM index.
  --salmon_index                     [string]  Path to directory or tar.gz archive for pre-built Salmon index.
  --hisat2_build_memory              [string]  Minimum memory required to use splice sites and exons in the HiSAT2 index build process. [default: 
                                               200.GB] 
  --gencode                          [boolean] Specify if your GTF annotation is in GENCODE format.
  --gtf_extra_attributes             [string]  By default, the pipeline uses the `gene_name` field to obtain additional gene identifiers from the input GTF file 
                                               when running Salmon. [default: gene_name] 
  --gtf_group_features               [string]  Define the attribute type used to group features in the GTF file when running Salmon. [default: gene_id]
  --featurecounts_group_type         [string]  The attribute type used to group feature types in the GTF file when generating the biotype plot with 
                                               featureCounts. [default: gene_biotype] 
  --featurecounts_feature_type       [string]  By default, the pipeline assigns reads based on the 'exon' attribute within the GTF file. [default: exon]
  --igenomes_base                    [string]  Directory / URL base for iGenomes references. [default: s3://ngi-igenomes/igenomes]
  --igenomes_ignore                  [boolean] Do not load the iGenomes reference config.

Read trimming options
  --trimmer                          [string]  Specifies the trimming tool to use - available options are 'trimgalore' and 'fastp'. (accepted: trimgalore, 
                                               fastp) [default: trimgalore] 
  --extra_trimgalore_args            [string]  Extra arguments to pass to Trim Galore! command in addition to defaults defined by the pipeline.
  --extra_fastp_args                 [string]  Extra arguments to pass to fastp command in addition to defaults defined by the pipeline.
  --min_trimmed_reads                [integer] Minimum number of trimmed reads below which samples are removed from further processing. Some downstream steps in 
                                               the pipeline will fail if this threshold is too low. [default: 10000] 

Read filtering options
  --bbsplit_fasta_list               [string]  Path to comma-separated file containing a list of reference genomes to filter reads against with BBSplit. You 
                                               have to also explicitly set `--skip_bbsplit false` if you want to use BBSplit. 
  --bbsplit_index                    [string]  Path to directory or tar.gz archive for pre-built BBSplit index.
  --remove_ribo_rna                  [boolean] Enable the removal of reads derived from ribosomal RNA using SortMeRNA.
  --ribo_database_manifest           [string]  Text file containing paths to fasta files (one per line) that will be used to create the database for 
                                               SortMeRNA. [default: ${projectDir}/assets/rrna-db-defaults.txt] 

UMI options
  --with_umi                         [boolean] Enable UMI-based read deduplication.
  --umitools_extract_method          [string]  UMI pattern to use. Can be either 'string' (default) or 'regex'. [default: string]
  --umitools_bc_pattern              [string]  The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the 
                                               UMI. 
  --umitools_bc_pattern2             [string]  The UMI barcode pattern to use if the UMI is located in read 2.
  --umi_discard_read                 [integer] After UMI barcode extraction discard either R1 or R2 by setting this parameter to 1 or 2, respectively.
  --umitools_umi_separator           [string]  The character that separates the UMI in the read name. Most likely a colon if you skipped the extraction with 
                                               UMI-tools and used other software. 
  --umitools_grouping_method         [string]  Method to use to determine read groups by subsuming those with similar UMIs. All methods start by identifying the 
                                               reads with the same mapping position, but treat similar yet nonidentical UMIs differently. (accepted: unique, 
                                               percentile, cluster, adjacency, directional) [default: directional] 
  --umitools_dedup_stats             [boolean] Generate output stats when running "umi_tools dedup".

Alignment options
  --aligner                          [string]  Specifies the alignment algorithm to use - available options are 'star_salmon', 'star_rsem' and 'hisat2'. 
                                               (accepted: star_salmon, star_rsem, hisat2) [default: star_salmon] 
  --pseudo_aligner                   [string]  Specifies the pseudo aligner to use - available options are 'salmon'. Runs in addition to '--aligner'. 
                                               (accepted: salmon) 
  --bam_csi_index                    [boolean] Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with 
                                               larger chromosome sizes. 
  --star_ignore_sjdbgtf              [boolean] When using pre-built STAR indices do not re-extract and use splice junctions from the GTF file.
  --salmon_quant_libtype             [string]   Override Salmon library type inferred based on strandedness defined in meta object.
  --min_mapped_reads                 [number]  Minimum percentage of uniquely mapped reads below which samples are removed from further processing. 
                                               [default: 5] 
  --seq_center                       [string]  Sequencing center information to be added to read group of BAM files.
  --stringtie_ignore_gtf             [boolean] Perform reference-guided de novo assembly of transcripts using StringTie i.e. dont restrict to those in GTF 
                                               file. 
  --extra_star_align_args            [string]  Extra arguments to pass to STAR alignment command in addition to defaults defined by the pipeline. Only available 
                                               for the STAR-Salmon route. 
  --extra_salmon_quant_args          [string]  Extra arguments to pass to Salmon quant command in addition to defaults defined by the pipeline.

Optional outputs
  --save_merged_fastq                [boolean] Save FastQ files after merging re-sequenced libraries in the results directory.
  --save_umi_intermeds               [boolean] If this option is specified, intermediate FastQ and BAM files produced by UMI-tools are also saved in the results 
                                               directory. 
  --save_non_ribo_reads              [boolean] If this option is specified, intermediate FastQ files containing non-rRNA reads will be saved in the results 
                                               directory. 
  --save_bbsplit_reads               [boolean] If this option is specified, FastQ files split by reference will be saved in the results directory.
  --save_reference                   [boolean] If generated by the pipeline save the STAR index in the results directory.
  --save_trimmed                     [boolean] Save the trimmed FastQ files in the results directory.
  --save_align_intermeds             [boolean] Save the intermediate BAM files from the alignment step.
  --save_unaligned                   [boolean] Where possible, save unaligned reads from either STAR, HISAT2 or Salmon to the results directory.

Quality Control
  --deseq2_vst                       [boolean] Use vst transformation instead of rlog with DESeq2. [default: true]
  --rseqc_modules                    [string]  Specify the RSeQC modules to run. [default: 
                                               bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication] 

Process skipping options
  --skip_bbsplit                     [boolean] Skip BBSplit for removal of non-reference genome reads. [default: true]
  --skip_umi_extract                 [boolean] Skip the UMI extraction from the read in case the UMIs have been moved to the headers in advance of the pipeline 
                                               run. 
  --skip_trimming                    [boolean] Skip the adapter trimming step.
  --skip_alignment                   [boolean] Skip all of the alignment-based processes within the pipeline.
  --skip_pseudo_alignment            [boolean] Skip all of the pseudo-alignment-based processes within the pipeline.
  --skip_markduplicates              [boolean] Skip picard MarkDuplicates step.
  --skip_bigwig                      [boolean] Skip bigWig file creation.
  --skip_stringtie                   [boolean] Skip StringTie.
  --skip_fastqc                      [boolean] Skip FastQC.
  --skip_preseq                      [boolean] Skip Preseq. [default: true]
  --skip_dupradar                    [boolean] Skip dupRadar.
  --skip_qualimap                    [boolean] Skip Qualimap.
  --skip_rseqc                       [boolean] Skip RSeQC.
  --skip_biotype_qc                  [boolean] Skip additional featureCounts process for biotype QC.
  --skip_deseq2_qc                   [boolean] Skip DESeq2 PCA and heatmap plotting.
  --skip_multiqc                     [boolean] Skip MultiQC.
  --skip_qc                          [boolean] Skip all QC steps except for MultiQC.

Institutional config options
  --custom_config_version            [string]  Git commit id for Institutional configs. [default: master]
  --custom_config_base               [string]  Base directory for Institutional configs. [default: 
                                               https://raw.githubusercontent.com/nf-core/configs/master] 
  --config_profile_name              [string]  Institutional config name.
  --config_profile_description       [string]  Institutional config description.
  --config_profile_contact           [string]  Institutional config contact information.
  --config_profile_url               [string]  Institutional config URL link.
  --test_data_base                   [string]  Base path / URL for data used in the test profiles [default: 
                                               https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3] 

Max job request options
  --max_cpus                         [integer] Maximum number of CPUs that can be requested for any single job. [default: 16]
  --max_memory                       [string]  Maximum amount of memory that can be requested for any single job. [default: 128.GB]
  --max_time                         [string]  Maximum amount of time that can be requested for any single job. [default: 240.h]

Generic options
  --help                             [boolean] Display help text.
  --version                          [boolean] Display version and exit.
  --publish_dir_mode                 [string]  Method used to save pipeline results to output directory. (accepted: symlink, rellink, link, copy, 
                                               copyNoFollow, move) [default: copy] 
  --email_on_fail                    [string]  Email address for completion summary, only when pipeline fails.
  --plaintext_email                  [boolean] Send plain-text email instead of HTML.
  --max_multiqc_email_size           [string]  File size limit when attaching MultiQC reports to summary emails. [default: 25.MB]
  --monochrome_logs                  [boolean] Do not use coloured log outputs.
  --hook_url                         [string]  Incoming hook URL for messaging service
  --multiqc_config                   [string]  Custom config file to supply to MultiQC.
  --multiqc_logo                     [string]  Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
  --multiqc_methods_description      [string]  Custom MultiQC yaml file containing HTML including a methods description.
  --validate_params                  [boolean] Boolean whether to validate parameters against the schema at runtime [default: true]
  --validationShowHiddenParams       [boolean] Show all params when using `--help`
  --validationFailUnrecognisedParams [boolean] Validation of parameters fails when an unrecognised parameter is found.
  --validationLenientMode            [boolean] Validation of parameters in lenient more.

------------------------------------------------------
If you use nf-core/rnaseq for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.1400710

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/rnaseq/blob/master/CITATIONS.md

Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@drpatelh drpatelh merged commit ae2dea7 into nf-core:dev Oct 11, 2023
6 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants