diff --git a/subdomains/genome/CONTRIBUTORS b/subdomains/genome/CONTRIBUTORS
new file mode 100644
index 00000000..bb026f69
--- /dev/null
+++ b/subdomains/genome/CONTRIBUTORS
@@ -0,0 +1,3 @@
+# If GitHub username, name and avatar will be fetched and displayed
+AnnaSyme
+neoformit
diff --git a/subdomains/genome/annotation.yml b/subdomains/genome/annotation.yml
new file mode 100644
index 00000000..e693290a
--- /dev/null
+++ b/subdomains/genome/annotation.yml
@@ -0,0 +1,361 @@
+id: annotation
+title: Genome annotation
+tabs:
+ - id: tools
+ title: Tools
+ heading_md: >
+ Common tools are listed here, or search for more in the full tool panel to the left.
+ content:
+ - title_md: MAKER
- genome annotation pipeline
+ description_md: >
+
+ MAKER is able to annotate both prokaryotes and eukaryotes. It works by aligning as many evidences as possible along the genome sequence, and then reconciling all these signals to determine probable gene structures.
+
The evidences can be transcript or protein sequences from the same (or closely related) organism. These sequences can come from public databases (like NR or GenBank) or from your own experimental data (transcriptome assembly from an RNASeq experiment for example). MAKER is also able to take into account repeated elements.
+
Funannotate predict
- predicted gene annotations
+ description_md: >
+
+ Funannotate predict
performs a comprehensive whole genome gene prediction. Uses AUGUSTUS, GeneMark, Snap, GlimmerHMM, BUSCO, EVidence Modeler, tbl2asn, tRNAScan-SE, Exonerate, minimap2. This approach differs from Maker as it does not need to train ab initio predictors.
+
RepeatMasker
- screen DNA sequences for interspersed repeats and low complexity regions
+ description_md: >
+
+ RepeatMasker is a program that screens DNA for repeated elements such as tandem repeats, transposons, SINEs and LINEs. Galaxy AU has installed the full and curated DFam screening databases, or a custom database can be provided in fasta
format. Additional reference data can be downloaded from RepBase.
+
InterProScan
- Scans InterPro database and assigns functional annotations
+ description_md: >
+ + Interproscan is a batch tool to query the InterPro database. It provides annotations based on multiple searches of profile and other functional databases. +
+ inputs: + - datatypes: + - fasta + label: Genome assembly + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Finterproscan%2Finterproscan" + - title_md:Funannotate compare
- compare several annotations
+ description_md: >
+
+ Funannotate compare
compares several annotations and outputs a GFF3 file with the best gene models. It can be used to compare the results of different gene predictors, or to compare the results of a gene predictor with a reference annotation.
+
JBrowse
- Genome browser to visualize annotations
+ description_md: ''
+ inputs:
+ - datatypes:
+ - fasta
+ label: Genome assembly
+ - datatypes:
+ - gff
+ - gff3
+ - bed
+ label: Annotations
+ - datatypes:
+ - bam
+ label: Mapped RNAseq data (optional)
+ button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fjbrowse%2Fjbrowse"
+ - title_md: Prokka
- Genome annotation, prokaryotes only
+ description_md: ''
+ inputs:
+ - datatypes:
+ - fasta
+ label: Genome assembly
+ button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fcrs4%2Fprokka%2Fprokka"
+
+ - title_md: FGenesH
- Genome annotation
+ description_md: >
+ + Annotate an assembled genome and output a GFF3 file. There are several modules that do different things - search for FGENESH in the tool panel to see them. +
++ Note: you must + + apply for access + + to this tool before use. +
+ inputs: + - datatypes: + - fasta + label: Genome assembly + - datatypes: + - fasta + label: Repeat-masked (hard) genome assembly + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=fgenesh_annotate&version=latest" + + - id: workflows + title: Workflows + heading_md: > + A workflow is a series of Galaxy tools that have been linked together to perform a specific analysis. You can use and customize the example workflows below. + Learn more. + content: + subsections: + - id: general + title: General use + content: + - title_md: Annotation with Maker + description_md: > +
+ Annotates a genome using multiple rounds of Maker, including gene prediction using SNAP and Augustus.
Tools: maker
snap
augustus
busco
jbrowse
+
+ Annotates a genome using Funannotate, includes RNAseq data with RNAstar, and protein predictions from EggNOG.
Tools: RNAstar
funannotate
eggnog
busco
jbrowse
aegean parseval
+
+ This How-to-Guide will describe the steps required to align transcript data to your genome on the Galaxy Australia platform, using multiple workflows. The outputs from these workflows can then be used as inputs into the next annotation workflow using FgenesH++. +
+ - title_md: Repeat masking + description_md: > ++ Mask repeats in the genome. +
+ inputs: + - datatypes: + - fasta + label: Assembled genomegenome.fasta
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=875"
+ view_link: https://workflowhub.eu/workflows/875
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - title_md: QC and trimming of RNAseq
+ description_md: >
+ + Trim and merge RNAseq reads. +
+ inputs: + - datatypes: + - fastqsanger.gz + label: "For each tissue: RNAseq R1 files in a collectionR1.fastqsanger.gz
; RNAseq R2 files in a collection R2.fastqsanger.gz
"
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=876"
+ view_link: https://workflowhub.eu/workflows/876
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - title_md: Find transcripts
+ description_md: >
+ + Align RNAseq to genome to find transcripts. +
+ inputs: + - datatypes: + - fasta + label: Masked genomemasked_genome.fasta
+ - fastqsanger.gz
+ label: "For each tissue: Trimmed and merged RNAseq R1 files R1.fastqsanger.gz
; Trimmed and merged RNAseq R2 files R2.fastqsanger.gz
"
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=877"
+ view_link: https://workflowhub.eu/workflows/877
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - title_md: Combine transcripts
+ description_md: >
+ + Merge transcriptomes from different tissues, and filter out non-coding sequences. +
+ inputs: + - datatypes: + - fasta + label: Masked genomemasked_genome.fasta
+ - gtf
+ label: Multiple transcriptomes in a collection transcriptome.gtf
+ - fasta.gz
+ label: Coding and non-coding sequences from NCBI coding_seqs.fna.gz
non-coding_seqs.fna.gz
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=878"
+ view_link: https://workflowhub.eu/workflows/878
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - title_md: Extract transcripts
+ description_md: >
+ + Extract longest transcripts. +
+ inputs: + - datatypes: + - fasta + label: Merged transcriptomesmerged_transcriptomes.fasta
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=879"
+ view_link: https://workflowhub.eu/workflows/879
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+
+ - title_md: Convert formats
+ description_md: >
+ + Convert formats for FgenesH++ +
+ inputs: + - datatypes: + - fasta + label: Transdecoder nucleotidestransdecoder_nucleotides.fasta
+ label: Transdecoder peptides transdecoder_peptides.fasta
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=880"
+ view_link: https://workflowhub.eu/workflows/880
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - id: tsi_annotation
+ title: Annotation with FgenesH++
+ content:
+ - title_md: About these workflows
+ description_md: >
+ + This How-to-Guide will describe the steps required to annotate your genome on the Galaxy Australia platform, using multiple workflows. +
+ - title_md: Annotation with FgenesH++ + description_md: > + Annotate the genome using outputs from the TSI transcriptome workflows. ++ Note: you must + + apply for access + + to this tool before use. +
+ inputs: + - datatypes: + - fasta + label: Assembled genome + - datatypes: + - fasta + label: Masked genome + - datatypes: + - fasta + label: > + Outputs from TSI convert formats workflow + (*.cdna
,
+ *.pro
,
+ *.dat
)
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=881"
+ view_link: https://workflowhub.eu/workflows/881
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - id: help
+ title: Help
+ content:
+ - title_md: What is genome annotation?
+ description_md: >
+
+ These slides from the Galaxy training network explain the process of genome annotation in detail. You can use the ←
and →
keys to navigate through the slides.
+
+ The flowchart below shows how you might use your input data (in green) with different Galaxy tools (in blue) to annotate a genome assembly. For example, one pathway would be taking an assembled genome, plus information about repeats, and data from RNA-seq, to run in the Maker pipeline. The annotatations can then be viewed in JBrowse. +
+ ++ A graphical representation of genome annotation +
+ - title_md: Can I use Fgenesh++ for annotation? + description_md: > ++ Fgenesh++ is a bioinformatics pipeline for automatic prediction of genes in eukaryotic genomes. It is presently not installed in Galaxy Australia, but the Australian Biocommons and partners have licensed the software and made it available via commandline. Australian researchers can apply for access through the Australian BioCommons. +
+ button_md: Apply + button_link: https://www.biocommons.org.au/fgenesh-plus-plus + button_tip: Apply for access to Fgenesh++ + - title_md: Can I use Apollo to share and edit the annotated genome? + description_md: > ++ Apollo is web-browser accessible system that lets you conduct real-time collaborative curation and editing of genome annotations. +
++ The Australian BioCommons and our partners at QCIF and Pawsey provide a hosted Apollo Portal service where your genome assembly and supporting evidence files can be hosted. All system administration is taken care of, so you and your team can focus on the annotation curation itself. +
++ This Galaxy tutorial provides a complete walkthrough of the process of refining eukaryotic genome annotations with Apollo. +
+ button_md: More info + button_link: https://support.biocommons.org.au/support/solutions/articles/6000244843-apollo-for-collaborative-curation-and-editing + - title_md: Tutorials + description_md: > ++ Genome annotation with Maker +
++ Genome annotation of eukaryotes is a little more complicated than for prokaryotes: eukaryotic genomes are usually larger than prokaryotes, with more genes. The sequences determining the beginning and the end of a gene are generally less conserved than the prokaryotic ones. Many genes also contain introns, and the limits of these introns (acceptor and donor sites) are not highly conserved. This Galaxy tutorial uses MAKER to annotate the genome of a small eukaryote: Schizosaccharomyces pombe (a yeast). +
++ Genome annotation with Funannotate +
++ This Galaxy tutorial provides a complete walkthrough of the process of annotation with Funannotate, including the preparation of RNAseq data, structural annotation, functional annotation, visualisation, and comparing annotations. +
+ - title_md: Galaxy Australia support + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_md: Request support + button_link: /request/support diff --git a/subdomains/genome/assembly.yml b/subdomains/genome/assembly.yml new file mode 100644 index 00000000..7609262f --- /dev/null +++ b/subdomains/genome/assembly.yml @@ -0,0 +1,411 @@ +id: assembly +title: Genome assembly +tabs: + - id: tools + title: Tools + heading_md: > + Common tools are listed here, or search for more in the full tool panel to the left. + content: + - title_md:Hifiasm
- assembly with PacBio HiFi data
+ description_md: >
+ + A haplotype-resolved assembler for PacBio HiFi reads. +
+ inputs: + - datatypes: + - fasta + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Fhifiasm%2Fhifiasm" + - title_md:Flye
- assembly with PacBio or Nanopore data
+ description_md: >
+ + de novo assembly of single-molecule sequencing reads, designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. +
+ inputs: + - datatypes: + - fasta + - fastq + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Fflye%2Fflye" + - title_md:Unicycler
- assembly with Illumina, PacBio or Nanopore data - bacteria only
+ description_md: >
+ + Hybrid assembly pipeline for bacterial genomes, uses both Illumina reads and long reads (PacBio or Nanopore). +
+ inputs: + - datatypes: + - fastq + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Funicycler%2Funicycler" + - title_md:YAHS
- scaffold assembly with HiC data
+ description_md: >
+ + YAHS is a scaffolding tool based on a computational method that exploits the genomic proximity information in Hi-C data sets for long-range scaffolding of de novo genome assemblies. Inputs are the primary assembly (or haplotype 1), and HiC reads mapped to the assembly. See this tutorial to learn how to create a suitable BAM file. +
+ inputs: + - label: Primary assembly or Haplotype 1genome.fasta
+ datatypes:
+ - fasta
+ - label: HiC reads mapped to assembly mapped_reads.bam
+ datatypes:
+ - bam
+ button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2yahs"
+ - title_md: Quast
- assess genome assembly quality
+ description_md: >
+ + QUAST = QUality ASsessment Tool. The tool evaluates genome assemblies by computing various metrics. If you have one or multiple genome assemblies, you can assess their quality with Quast. It works with or without reference genome. +
+ inputs: + - datatypes: + - fasta + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fquast%2Fquast" + - title_md:Busco
- assess genome assembly quality
+ description_md: >
+ + BUSCO: assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs. The tool attempts to provide a quantitative assessment of the completeness in terms of the expected gene content of a genome assembly, transcriptome, or annotated gene set. +
+ inputs: + - datatypes: + - fasta + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fbusco%2Fbusco" + - title_md:MitoHiFi
- assemble mitochondrial genomes
+ description_md: >
+ + Assemble mitochondrial genomes from PacBio HiFi reads. Run first to find a related mitogenome, then run to assemble the genome. Inputs are PacBio HiFi reads in fasta or fastq format, and a related mitogenome in both fasta and genbank formats. +
+ inputs: + - datatypes: + - fasta + - fastq + - genbank + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Fmitohifi%2Fmitohfi" + + - id: workflows + title: Workflows + heading_md: > + A workflow is a series of Galaxy tools that have been linked together to perform a specific analysis. You can use and customize the example workflows below. + Learn more. + content: + subsections: + - id: pacbio + title: TSI assembly workflows - PacBio HiFi data + content: + - title_md: About these workflows + description_md: > ++ This How-to-Guide will describe the steps required to assemble your genome on the Galaxy Australia platform, using multiple workflows. There is also a guide about the Genome Assessment workflow, and the HiC Scaffolding workflow. +
+ - title_md: BAM to FASTQ + QC v1.0 + description_md: > ++ Convert a BAM file to FASTQ format to perform QC analysis (required if your data is in BAM format). +
+ inputs: + - datatypes: + - bam + label: PacBio subreads.bam + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=220" + view_link: https://workflowhub.eu/workflows/220 + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - title_md: PacBio HiFi genome assembly using hifiasm v2.1 + description_md: > ++ Assemble a genome using PacBio HiFi reads. +
+ inputs: + - datatypes: + - fastqsanger + label: HiFi reads + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=221" + view_link: https://workflowhub.eu/workflows/221 + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - title_md: Purge duplicates from hifiasm assembly v1.0 + description_md: > ++ Optional workflow to purge duplicates from the contig assembly. +
+ inputs: + - datatypes: + - fastqsanger + label: HiFi reads + - datatypes: + - fasta + label: Primary assembly contigs + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=237" + view_link: https://workflowhub.eu/workflows/237 + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - title_md: Genome assessment post-assembly + description_md: > +
+ Evaluate the quality of your genome assembly with a comprehensive report including FASTA stats
, BUSCO
, QUAST
, Meryl
and Merqury
.
+
+ If you have HiC data, scaffold your assembly using YAHS
.
+
+ This tutorial describes the steps required to assemble a genome on Galaxy with Nanopore and Illumina data. +
+ - title_md: Flye assembly with Nanopore data + description_md: > ++ Assemble Nanopore long reads. This workflow can be run alone or as part of a combined workflow for large genome assembly. +
+ inputs: + - datatypes: + - fastqsanger + label: Long reads (may be raw, filtered and/or corrected) + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=225" + view_link: https://workflowhub.eu/workflows/225 + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - title_md: Assembly polishing + description_md: > +
+ Polishes (corrects) an assembly, using long reads (Racon
and Medaka
) and short reads (Racon
).
+
+ Assesses the quality of the genome assembly. Generates statistics, determines if expected genes are present and align contigs to a reference genome. +
+ inputs: + - datatypes: + - fasta + label: Polished assembly + - datatypes: + - fasta + label: Reference genome assembly (e.g. related species) + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=229" + view_link: https://workflowhub.eu/workflows/229 + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - id: hic + title: VGP assembly workflows - PacBio HiFi and (optional) HiC data + content: + - title_md: About these workflows + description_md: > ++ These workflows have been developed as part of the global Vertebrate Genome Project (VGP). A guide to using these in Galaxy Australia can be found here. A complete guide to the individual workflows and sample results can be found here. There are many different ways that these workflows can be used in practice - for a comprehensive example, check out this Galaxy tutorial. +
+ - title_md: Kmer profiling + description_md: > ++ This workflow produces a Meryl database and Genomescope outputs that will be used to determine parameters for following workflows, and assess the quality of genome assemblies. Specifically, it provides information about the genomic complexity, such as the genome size and levels of heterozygosity and repeat content, as well about the data quality. +
+ inputs: + - datatypes: + - fastq + label: PacBio HiFi reads + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/kmer-profiling-hifi-VGP1/main" + view_link: https://dockstore.org/workflows/github.com/iwc-workflows/kmer-profiling-hifi-VGP1/main:main + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - title_md: Hifi assembly and HiC phasing + description_md: > +
+ This workflow uses hifiasm
(HiC mode) to generate HiC-phased haplotypes (hap1
and hap2
). This is in contrast to its default mode, which generates primary and alternate pseudohaplotype assemblies. This workflow includes three tools for evaluating assembly quality: gfastats
, BUSCO
and Merqury
.
+
+ Note: if you have multiple input files for each HiC set, they need to be concatenated. The forward set needs to be concatenated in the same order as reverse set. +
+ inputs: + - datatypes: + - fasta + label: PacBio HiFi reads + - datatypes: + - fastq + label: PacBio HiC reads (forward) + - datatypes: + - fastq + label: PacBio HiC reads (reverse) + - datatypes: + - meryldb + label:Meryl
kmer database
+ - datatypes:
+ - txt
+ label: GenomeScope
genome profile summary
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Assembly-Hifi-HiC-phasing-VGP4/main"
+ view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Assembly-Hifi-HiC-phasing-VGP4/main:main
+ view_tip: View in WorkflowHub
+ button_tip: Import to Galaxy Australia
+
+ - title_md: Hifi assembly without HiC data
+ description_md: >
+
+ This workflow uses hifiasm
to generate primary and alternate pseudohaplotype assemblies. This workflow includes three tools for evaluating assembly quality: gfastats
, BUSCO
and Merqury
.
+
Meryl
kmer database
+ - datatypes:
+ - txt
+ label: GenomeScope
genome profile summary
+ button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Assembly-Hifi-only-VGP3/main"
+ view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Assembly-Hifi-only-VGP3/main:main
+ view_tip: View in Dockstore
+ button_tip: Import to Galaxy Australia
+
+ - title_md: HiC scaffolding
+ description_md: >
+ + This workflow scaffolds the assembly contigs using information from HiC data. +
+ inputs: + - datatypes: + - gfa + label: Assembly of haplotype 1 + - datatypes: + - fastq + label: HiC forward reads + - datatypes: + - fastq + label: HiC reverse reads + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Scaffolding-HiC-VGP8/main" + view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Scaffolding-HiC-VGP8/main:main + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + - title_md: Decontamination + description_md: > ++ This workflow identifies and removes contaminants from the assembly. +
+ inputs: + - datatypes: + - fasta + label: Assembly + button_link: "{{ galaxy_base_url }}/workflows/trs_import?trs_server=dockstore.org&trs_id=%23workflow/github.com/iwc-workflows/Assembly-decontamination-VGP9/main:v0.1" + view_link: https://dockstore.org/workflows/github.com/iwc-workflows/Assembly-decontamination-VGP9/main:v0.1 + view_tip: View in WorkflowHub + button_tip: Import to Galaxy Australia + + - id: help + title: Help + content: + - title_md: Can I use Galaxy Australia to assemble a large genome? + description_md: > ++ Yes. Galaxy Australia has assembly tools for small prokaryote genomes as well as larger eukaryote genomes. We are continually adding new tools and optimising them for large genome assemblies - this means adding enough computer processing power to run data-intensive tools, as well as configuring aspects such as parallelisation. +
++ Please contact us if: +
++ Genome assembly can be a very involved process. A typical genome assembly procedure might look like: +
++ A graphical representation of genome assembly +
+ - title_md: Which tools should I use? + description_md: > ++ There is no best set of tools to recommend - new tools are developed constantly, sequencing technology improves rapidly, and many genomes have never been sequenced before and thus their characteristics and quirks are unknown. The "Tools" tab in this section includes a list of commonly-used tools that could be a good starting point. You will find other tools in recent publications or used in workflows. +
++ You can also search for tools in Galaxy's tool panel. If they aren't installed on Galaxy Australia, you can request installation of a tool. +
++ We recommend testing a tool on a small data set first and seeing if the results make sense, before running on your full data set. +
+ - title_md: Tutorials + description_md: > ++ Find 15+ Galaxy training tutorials here. +
++ Introduction to genome assembly and annotation (slides) +
++ Vertebrate genome assembly pipeline (tutorial) +
++ Nanopore and illumina genome assembly (tutorial) +
++ Share workflows and results with workflow reports (tutorial) +
+ - title_md: How can I assess the quality of my genome assembly? + description_md: > ++ Once a genome has been assembled, it is important to assess the quality of the assembly, and in the first instance, this quality control (QC) can be achieved using the workflow described here. +
+ button_md: Workflow tutorial + button_link: https://australianbiocommons.github.io/how-to-guides/genome_assembly/assembly_qc + - title_md: Galaxy Australia support + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_md: Request support + button_link: /request/support diff --git a/subdomains/genome/base.yml b/subdomains/genome/base.yml new file mode 100644 index 00000000..232de7a9 --- /dev/null +++ b/subdomains/genome/base.yml @@ -0,0 +1,36 @@ +# Test this locally with: +# http://127.0.0.1:8000/lab/export?content_root=http://localhost:8000/static/home/labs/genome/base.yml + +# Request this on site.usegalaxy.org.au with: +# https://site.usegalaxy.org.au/lab/export?content_root=https://site.usegalaxy.org.au/static/home/labs/genome/base.yml + +# Check out the documentation for building exported labs: +# https://site.usegalaxy.org.au/lab/export + +# Use these variables in HTML templates like: +# "Welcome to the Galaxy {{ site_name }} {{ lab_name }}" +# To make the content more generic and reusable across sites +site_name: Australia +lab_name: Genome Lab +nationality: Australian +galaxy_base_url: https://genome.usegalaxy.org.au # Use for rendering tool/workflow URLs. Trailing '/' will be removed. +subdomain: genome +root_domain: usegalaxy.org.au +feedback_email: help@genome.edu.au + +# Custom content relative to this file URL +header_logo: static/logo.png +custom_css: static/custom.css +intro_md: templates/intro.html +conclusion_md: templates/conclusion.html +footer_md: templates/footer.html + + +# Data (Tools, Workflows etc.) to be rendered into sections/tabs/accordion elements. +# Either: +# 1. Relative to this file URL +# 2. Full URL to fetch globally centralized content +sections: + - data.yml + - assembly.yml + - annotation.yml diff --git a/subdomains/genome/data.yml b/subdomains/genome/data.yml new file mode 100644 index 00000000..79ba2c2b --- /dev/null +++ b/subdomains/genome/data.yml @@ -0,0 +1,160 @@ +id: data +title: Data import and preparation +tabs: + - id: tools + title: Tools + heading_md: > + Common tools are listed here, or search for more in the full tool panel to the left. + content: + - title_md: Import data to Galaxy + description_md: > + Standard upload of data to Galaxy, from your computer or from the web. + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=upload1" + - title_md:FastQC
- sequence quality reports
+ description_md: >
+ + Before using your sequencing data, it's important to ensure that + the data quality is sufficient for your analysis. +
+ inputs: + - datatypes: + - fastq + - bam + - sam + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fdevteam%2Ffastqc%2Ffastqc" + - title_md:FastP
- sequence quality reports, trimming & filtering
+ description_md: >
+ + Faster run than FastQC, this tool can also trim reads and filter by quality. +
+ inputs: + - datatypes: + - fastq + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Ffastp%2Ffastp" + - title_md:NanoPlot
- visualize Oxford Nanopore data
+ description_md: >
+ + A plotting suite for Oxford Nanopore sequencing data and alignments. +
+ inputs: + - datatypes: + - fastq + - fasta + - vcf_bgzip + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fnanoplot%2Fnanoplot" + - title_md:GenomeScope
- estimate genome size
+ description_md: >
+ + A set of metrics and graphs to visualize genome size and complexity prior to assembly. +
+ inputs: + - datatypes: + - tabular + label: Output fromMeryl
or Jellyfish histo
+ button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fgenomescope%2Fgenomescope"
+ - title_md: Meryl
- count kmers
+ description_md: >
+ + Prepare kmer count histogram for input to GenomeScope. +
+ inputs: + - datatypes: + - fastq + - fasta + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fmeryl%2Fmeryl" + - id: workflows + title: Workflows + heading_md: > + A workflow is a series of Galaxy tools that have been linked together to perform a specific analysis. You can use and customize the example workflows below. + Learn more. + content: + - title_md: Data QC + description_md: > +
+ Report statistics from sequencing reads.
Tools: nanoplot
fastqc
multiqc
+
+ Estimates genome size and heterozygosity based on counts of kmers.
Tools: meryl
genomescope
+
+ Trims and filters raw sequence reads according to specified settings.
Tools: fastp
+
+ You can upload your data to Galaxy using the Upload tool from anywhere in Galaxy. Just look for the "Upload data" button at the top of the tool panel. +
+ button_md: More info + button_link: https://training.galaxyproject.org/training-material/topics/galaxy-interface/ + - title_md: How can I subsample my data? + description_md: > +
+ We recommend subsampling large data sets to test tools and workflows. A useful tool is seqtk_seq
, setting the parameter at "Sample fraction of sequences".
+
+ BioPlatforms Australia allows data downloads via URL. Once you have generated one of these URLs in the BPA portal, you can import it into Galaxy using the "Fetch data" feature of the Upload tool. +
+ button_md: More info + button_link: https://australianbiocommons.github.io/how-to-guides/genome_assembly/hifi_assembly#in-depth-workflow-guide + - title_md: Can I upload sensitive data? + description_md: > ++ No, do not upload personal or sensitive, such as human health or clinical data. Please see our Data Privacy page for definitions of sensitive and health-related information. +
++ Please also make sure you have read our Terms of Service, which covers hosting and analysis of research data. +
+ - title_md: Is my data private? + description_md: > ++ Please read our Privacy Policy for information on your personal data and any data that you upload. +
+ - title_md: How can I increase my storage quota? + description_md: > ++ Please submit a quota request if your Galaxy Australia account reaches its data storage limit. Requests are usually provisioned quickly if you provide a reasonable use case for your request. +
+ button_md: Request + button_link: /request/quota + - title_md: "Tutorial: Quality Control" + description_md: > +
+ Quality control and data cleaning is an essential first step in any NGS analysis. This tutorial will show you how to use and interpret results from FastQC
, NanoPlot
and PycoQC
.
+
+ This practical aims to familiarize you with the Galaxy user interface. It will teach you how to perform basic tasks such as importing data, running tools, working with histories, creating workflows, and sharing your work. +
+ button_md: Tutorial + button_link: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-strands/tutorial.html + - title_md: Galaxy Australia support + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_md: Request support + button_link: /request/support diff --git a/subdomains/genome/static/annotation-overview.png b/subdomains/genome/static/annotation-overview.png new file mode 100644 index 00000000..0bdeaa80 Binary files /dev/null and b/subdomains/genome/static/annotation-overview.png differ diff --git a/subdomains/genome/static/assembly-overview.png b/subdomains/genome/static/assembly-overview.png new file mode 100644 index 00000000..61cf705c Binary files /dev/null and b/subdomains/genome/static/assembly-overview.png differ diff --git a/subdomains/genome/static/custom.css b/subdomains/genome/static/custom.css new file mode 100644 index 00000000..e69de29b diff --git a/subdomains/genome/static/logo.png b/subdomains/genome/static/logo.png new file mode 100644 index 00000000..ea8b9ffa Binary files /dev/null and b/subdomains/genome/static/logo.png differ diff --git a/subdomains/genome/templates/conclusion.html b/subdomains/genome/templates/conclusion.html new file mode 100644 index 00000000..13e7d771 --- /dev/null +++ b/subdomains/genome/templates/conclusion.html @@ -0,0 +1,163 @@ +
+ Welcome to the Galaxy {{ site_name }} {{ lab_name }}. Get quick access to tools, workflows and tutorials for genome assembly and annotation.
+
+
+ What is this page?
+
+
msconvert
"
+ description_md: >
+ + Convert and/or filter mass spectrometry files. +
+ inputs: + - datatypes: + - thermo.raw + - mzML + - mzXML + - raw + - wiff + - wiff.tar + - agilentbrukeryep.d.tar + - agilentmasshunter.d.tar + - brukerbaf.d.tar + - brukertdf.d.tar + - watersmasslynx.raw.tar + label: Input MS data + outputs: + - datatypes: + - mz5 + - mzML + - mzXML + - mgf + - ms2 + label: Output MS data + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/msconvert/msconvert/" + - title_md: "Thermo RAW file converter
"
+ description_md: >
+ + Thermo RAW file converter. +
+ inputs: + - datatypes: + - thermo.raw + label: Thermo RAW file + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/thermo_raw_file_converter/thermo_raw_file_converter/" + # - id: help + # title: Help + # content: [] diff --git a/subdomains/proteomics/sections/data.yml b/subdomains/proteomics/sections/data.yml new file mode 100644 index 00000000..b162e648 --- /dev/null +++ b/subdomains/proteomics/sections/data.yml @@ -0,0 +1,55 @@ +id: data +title: Data import +tabs: + - id: tools + title: Tools + heading_md: "Some example tools are listed here: you can also search for more in the full tool panel to the left." + content: + - title_md: "Import data to Galaxy" + description_md: "Standard upload of data to Galaxy, from your computer or from the web.
" + button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=upload1" + - id: help + title: Help + content: + - title_md: "Can I upload sensitive data?" + description_md: | ++ No, do not upload personal or sensitive, such as human health or clinical data. + Please see our + Data Privacy + page for definitions of sensitive and health-related information. +
++ Please also make sure you have read our + Terms of Service, + which covers hosting and analysis of research data. +
+ - title_md: "Is my data private?" + description_md: | ++ Please read our + Privacy Policy + for information on your personal data and any data that you upload. +
+ - title_md: "How can I increase my storage quota?" + description_md: | ++ Please submit a quota request if your Galaxy Australia account reaches its data storage limit. Requests are usually provisioned quickly if you provide a reasonable use case for your request. +
+ button_link: "/request/quota" + button_md: "Request" + - title_md: "Tutorial: Introduction to proteomics, protein identification, quantification and statistical modelling" + description_md: | ++ This practical aims to familiarize you with Galaxy for Proteomics, including theory, methods and software examples. +
+ button_link: "https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/introduction/slides.html#1" + button_md: "Tutorial" + - title_md: "Galaxy Australia support" + description_md: | ++ Any user of Galaxy Australia can request support through an + online form. +
+ button_link: "/request/support" + button_md: "Request support" diff --git a/subdomains/proteomics/sections/database_searching.yml b/subdomains/proteomics/sections/database_searching.yml new file mode 100644 index 00000000..fb36ab0c --- /dev/null +++ b/subdomains/proteomics/sections/database_searching.yml @@ -0,0 +1,78 @@ +id: database_searching +title: Database searching +tabs: + - id: tools + title: Tools + heading_md: > + Some example tools are listed here: you can also search for more in the full tool panel to the left. + content: + - title_md: "DecoyDatabase
"
+ description_md: >
+ + Create decoy sequence database from forward sequence database. +
+ inputs: + - datatypes: + - fasta + label: Input FASTA file(s), each containing a database + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/openms_decoydatabase/DecoyDatabase/" + - title_md: "MaxQuant
"
+ description_md: >
+ + MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. +
+ inputs: + - datatypes: + - thermo.raw + - mzML + - mzXML + label: Mass spectrometry data sets + - datatypes: + - tabular + label: Experimental design template + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/maxquant/maxquant/" + - title_md: "Morpheus
"
+ description_md: >
+ + Database search algorithm for high-resolution tandem mass spectra. +
+ inputs: + - datatypes: + - mzML + label: Indexed mzML + - datatypes: + - fasta + - uniprotxml + label: "MS Protein Search Database: UniProt Xml or Fasta" + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/morpheus/morpheus/" + - id: help + title: Help + content: + - title_md: "Introduction to proteomics, protein identification, quantification and statistical modelling" + description_md: > ++ Start here if you are new to proteomic analysis in Galaxy. +
+ button_link: "https://usegalaxy.org.au/training-material/topics/proteomics/tutorials/introduction/slides.html" + button_md: "Tutorial" + - title_md: "Label-free data analysis using MaxQuant" + description_md: > ++ Learn how to use MaxQuant for the analysis of label-free shotgun (DDA) data. +
+ button_link: "https://proteomics.usegalaxy.org.au/training-material/topics/proteomics/tutorials/maxquant-label-free/tutorial.html" + button_md: "Tutorial" + - title_md: "Peptide and Protein ID using OpenMS tools" + description_md: > ++ Learn how to identify proteins from LC-MS/MS raw files. +
+ button_link: "https://usegalaxy.org.au/training-material/topics/proteomics/tutorials/protein-id-oms/tutorial.html" + button_md: "Tutorial" + - title_md: "Galaxy Australia support" + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_link: "/request/support" + button_md: "Request support" \ No newline at end of file diff --git a/subdomains/proteomics/sections/dda_standardised_tools.yml b/subdomains/proteomics/sections/dda_standardised_tools.yml new file mode 100644 index 00000000..d66139c7 --- /dev/null +++ b/subdomains/proteomics/sections/dda_standardised_tools.yml @@ -0,0 +1,71 @@ +id: dda_standardised_tools +title: DDA Standardised Tools +tabs: + - id: tools + title: Tools + heading_md: > + Some example tools are listed here: you can also search for more in the full tool panel to the left. + content: + - title_md: "MaxQuant
"
+ description_md: >
+ + MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. +
+ inputs: + - datatypes: + - thermo.raw + - mzML + - mzXML + label: MS spectra (input file) + - datatypes: + - tabular + label: Experimental design template + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/maxquant/maxquant/" + - title_md: "MSstats
"
+ description_md: >
+ + Statistical relative protein significance analysis in DDA, SRM and DIA Mass Spectrometry. +
+ inputs: + - datatypes: + - tabular + - csv + label: Either the 10-column MSstats format or the outputs of spectral processing tools such as MaxQuant, OpenSWATH. + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/msstats/msstats/" + - title_md: "LFQ Analyst
"
+ description_md: >
+ + Analyze and Visualize Label-Free Proteomics output from MaxQuant. +
+ inputs: + - datatypes: + - txt + label: Protein groups (MaxQuant output) + - datatypes: + - txt + label: Experimental Design Matrix (MaxQuant output) + button_link: "{{ galaxy_base_url }}/root?tool_id=interactive_tool_lfqanalyst_2" + - id: help + title: Help + content: + - title_md: "LFQ-Analyst: Manual" + description_md: > ++ A detailed user manual for LFQ-Analyst. +
+ button_link: "https://analyst-suite.monash-proteomics.cloud.edu.au/apps/lfq-analyst/LFQ-Analyst_manual.pdf" + button_md: "Manual" + - title_md: "MaxQuant and MSstats for the analysis of label-free data" + description_md: > ++ Learn how to use MaxQuant and MSstats for the analysis of label-free shotgun (DDA) data. +
+ button_link: "https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/maxquant-msstats-dda-lfq/tutorial.html" + button_md: "Tutorial" + - title_md: "Galaxy Australia support" + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_link: "/request/support" + button_md: "Request support" diff --git a/subdomains/proteomics/sections/dda_tmt.yml b/subdomains/proteomics/sections/dda_tmt.yml new file mode 100644 index 00000000..ae8bd0b8 --- /dev/null +++ b/subdomains/proteomics/sections/dda_tmt.yml @@ -0,0 +1,60 @@ +id: dda_tmt +title: DDA TMT +tabs: + - id: tools + title: Tools + heading_md: > + Some example tools are listed here: you can also search for more in the full tool panel to the left. + content: + - title_md: "MaxQuant
"
+ description_md: >
+ + MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. +
+ inputs: + - datatypes: + - thermo.raw + - mzML + - mzXML + label: MS spectra (input file) + - datatypes: + - tabular + label: Experimental design template + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/maxquant/maxquant/" + - title_md: "TMT Analyst
"
+ description_md: >
+ + Analyze and Visualize Label-Free Proteomics output from MaxQuant. +
+ inputs: + - datatypes: + - txt + label: Protein groups (MaxQuant output) + - datatypes: + - txt + label: Experimental Design Matrix (MaxQuant output) + button_link: "{{ galaxy_base_url }}/root?tool_id=interactive_tool_tmtanalyst" + - id: help + title: Help + content: + - title_md: "TMT-Analyst: Manual" + description_md: > ++ A detailed user manual for TMT-Analyst. +
+ button_link: "https://analyst-suites.org/apps/tmt-analyst/TMT-Analyst-manual.pdf" + button_md: "Manual" + - title_md: "MaxQuant and MSstats for the analysis of TMT data" + description_md: > ++ Learn how to use MaxQuant and MSstats for the analysis of TMT labelled shotgun (DDA) data. +
+ button_link: "https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/maxquant-msstats-tmt/tutorial.html" + button_md: "Tutorial" + - title_md: "Galaxy Australia support" + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_link: "/request/support" + button_md: "Request support" diff --git a/subdomains/proteomics/sections/dia_standardised_tools.yml b/subdomains/proteomics/sections/dia_standardised_tools.yml new file mode 100644 index 00000000..825e07f5 --- /dev/null +++ b/subdomains/proteomics/sections/dia_standardised_tools.yml @@ -0,0 +1,43 @@ +id: dia_standardised_tools +title: DIA Standardised Tools +tabs: + - id: tools + title: Tools + heading_md: > + Some example tools are listed here: you can also search for more in the full tool panel to the left. + content: + - title_md: "MSstats
"
+ description_md: >
+ + Statistical relative protein significance analysis in DDA, SRM and DIA Mass Spectrometry. +
+ inputs: + - datatypes: + - tabular + - csv + label: Either the 10-column MSstats format or the outputs of spectral processing tools such as MaxQuant, OpenSWATH. + button_link: "{{ galaxy_base_url }}/root?tool_id=toolshed.g2.bx.psu.edu/repos/galaxyp/msstats/msstats/" + - id: help + title: Help + content: + - title_md: "Galaxy Australia support" + description_md: > ++ Any user of Galaxy Australia can request support through an online form. +
+ button_link: "/request/support" + button_md: "Request support" + - title_md: "DIA Analysis using OpenSwathWorkflow" + description_md: > ++ Learn how to analyse HEK-Ecoli spike-in DIA data in Galaxy, understand DIA data principles and characteristics, and use OpenSwathworkflow to analyze HEK-Ecoli spike-in DIA data. +
+ button_link: "https://usegalaxy.org.au/training-material/topics/proteomics/tutorials/DIA_Analysis_OSW/tutorial.html" + button_md: "Tutorial" + - title_md: "Library Generation for DIA Analysis" + description_md: > ++ Learn how to generate a spectral library from data dependent acquisition (DDA) MS data, understand DIA data principles and characteristics, and optimize and refine a spectral library for the analysis of DIA data. +
+ button_link: "https://usegalaxy.org.au/training-material/topics/proteomics/tutorials/DIA_lib_OSW/tutorial.html" + button_md: "Tutorial" diff --git a/subdomains/proteomics/static/custom.css b/subdomains/proteomics/static/custom.css new file mode 100644 index 00000000..cfd8f237 --- /dev/null +++ b/subdomains/proteomics/static/custom.css @@ -0,0 +1,21 @@ +#whatIsThisPage { + position: absolute; + top: 135px; + right: calc(50vw - 500px); +} + +@media (max-width: 1070px) { + #whatIsThisPage { + right: 2rem; + } +} +@media (max-width: 992px) { + #whatIsThisPage { + top: 95px; + } +} +@media (max-width: 730px) { + #whatIsThisPage { + top: 2rem; + } +} diff --git a/subdomains/proteomics/static/logo.png b/subdomains/proteomics/static/logo.png new file mode 100644 index 00000000..f90d01be Binary files /dev/null and b/subdomains/proteomics/static/logo.png differ diff --git a/subdomains/proteomics/templates/conclusion.html b/subdomains/proteomics/templates/conclusion.html new file mode 100644 index 00000000..c097223b --- /dev/null +++ b/subdomains/proteomics/templates/conclusion.html @@ -0,0 +1 @@ +{% include 'home/snippets/header-cards.html' %} diff --git a/subdomains/proteomics/templates/footer.html b/subdomains/proteomics/templates/footer.html new file mode 100644 index 00000000..259d18e6 --- /dev/null +++ b/subdomains/proteomics/templates/footer.html @@ -0,0 +1,9 @@ + + diff --git a/subdomains/proteomics/templates/intro.html b/subdomains/proteomics/templates/intro.html new file mode 100644 index 00000000..41a4c0d1 --- /dev/null +++ b/subdomains/proteomics/templates/intro.html @@ -0,0 +1,155 @@ ++ Welcome to the Galaxy Australia Proteomics Lab. Get quick access to the + tools, workflows and tutorials you need to get started with proteomics on + Galaxy. +
+ ++ This page is currently under development in consultation with the + + Australian Proteomics Bioinformatics community. +
+