Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first release of RNAseq DE analysis, filtering and plotting workflow #582

Merged
merged 5 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions workflows/transcriptomics/rnaseq-de/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /rnaseq-de-filtering-plotting.ga
testParameterFiles:
- /rnaseq-de-filtering-plotting-tests.yml
authors:
- name: Pavankumar Videm
orcid: 0000-0002-5192-126X
3 changes: 3 additions & 0 deletions workflows/transcriptomics/rnaseq-de/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## [0.1] 2024-10-25

First release.
23 changes: 23 additions & 0 deletions workflows/transcriptomics/rnaseq-de/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# RNA-seq Differential expression and filtering workflow

This workflow works only with an experimental setup containing exactly 2 conditions with at least 2 replicates per condition.

## Inputs dataset

- Counts from changed condition: Counts from experimental condition or changed condition. For eg. counts from treatment or knockdown samples.
- Counts from reference condition: Counts from reference condition or base condition. For eg. counts from untreated or wildtype samples.
- Gene Annotaton: The same GTF file used for mapping and quantification. It is used to annotate the DESeq2 results table. Ideally, the GTF file should contain `gene_id`, `gene_biotype` and `gene_name` attributes.

## Inputs values

- Count files have header: Indicate whether your input count files have a header line. Usually, count files generated from featureCounts tool have a header line whereas count files from RNA-STAR do not have.
- Adjusted p-value threshold: All the genes with an adjusted p-value less than this value are considered as differentially expressed. With a value of 0.05, expect 5% of false positives in the differentially expressed genes list. If empty, a default value of 0.05 is used.
- log2 fold change threshold: All the genes with an absolute fold change (regarless of up or down regulation) more than this value are selected. A log2 FC of 3 equals to an absolute fold change of 8 (2³). If empty, a default value of 1.0 is used.

## Processing

- The workflow uses DESeq2 for performing differential expression analysis. In addition to the results table, it also produces normalized counts table.
- The results table is annotated with gene positions, biotypes, gene symbols.
- The annotated results table is further filtered with the input adjusted p-value and log2 fold change thresholds.
- A valcano plot is generated with top 10 significantly differentially expressed genes.
- A heatmap of log trasformed normalized counts and another heatmap of Z-scores is generated.
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
- doc: Test outline for RNAseq_DE_filtering_plotting
job:
Gene Annotaton:
class: File
location: https://zenodo.org/records/14056162/files/Saccharomyces_cerevisiae.R64-1-1.113.gtf
filetype: gtf
Counts from changed condition:
class: Collection
collection_type: list
elements:
- class: File
identifier: SRR5085169 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085169.tabular
- class: File
identifier: SRR5085170 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085170.tabular
Counts from reference condition:
class: Collection
collection_type: list
elements:
- class: File
identifier: SRR5085167 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085167.tabular
- class: File
identifier: SRR5085168 Counts Table
location: https://zenodo.org/records/14056162/files/SRR5085168.tabular
Count files have header: true
Adjusted p-value threshold: 0.1
log2 fold change threshold: 0.5
outputs:
Annotated DESeq2 results table:
has_text_matching:
expression: "YML123C\t122.984408142053\t-1.67[0-9]*\t0.21[0-9]*\t-7.66[0-9]*\t1.81[0-9]*e-14\t5.04[0-9]*e-[0-9]*\tchrXIII\t24036\t25800\t-\tprotein_coding\tPHO84"
expression: "YKL081W\t264.71[0-9]*\t-0.54[0-9]*\t0.15[0-9]*\t-3.46[0-9]*\t0.00[0-9]*\t0.09[0-9]*\tchrXI\t282890\t284455\t+\tprotein_coding\tTEF4"
Heatmap of Z-scores:
has_size:
value: 19510
delta: 1000
DESeq2 Normalized Counts:
has_text_matching:
expression: "YML123C\t210.50[0-9]*\t180.36[0-9]*\t48.64[0-9]*\t52.43[0-9]*"
expression: "YKL081W\t313.76[0-9]*\t322.37[0-9]*\t223.48[0-9]*\t199.24[0-9]*"
DESeq2 Plots:
has_size:
value: 1193021
delta: 60000
Volcano Plot of DE genes:
has_size:
value: 301346
delta: 15000
Heatmap of log transformed normalized counts:
has_size:
value: 19501
delta: 1000
Loading