layout |
---|
subsite-galaxy |
Welcome to the Galaxy CLIP-Explorer -- a webserver to process, analyse and visualise CLIP-Seq data.
Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take a guided tour through Galaxy's user interface.
Take a look at the CLIP-Seq data analysis tutorial on the Galaxy Training Network where you can analyse CLIP-Seq data of RBFOX2 from human liver cancer cells (Hep G2). The tutorial will help you to understand the analysis steps and the most important parameters and tools that are used in CLIP-Explorer.
The underlying workflow of the tutorial can be found here.
We recommend to follow the tutorial on FastQC for quality checks and the tutorial for IGV for data inspection.
The Galaxy Training Network tutorial uses eCLIP data from human liver cancer cells (Hep G2) and is hosted on zenodo:
Galaxy CLIP-Explorer can process large CLIP-Seq data of eCLIP and iCLIP. We processed eCLIP data with around 20 million reads from Nostrand et al. (2016). CLIP-Explorer can handle multiplexed and de-multiplexed eCLIP and iCLIP data in FASTQ and FASTA format.
(A) Galaxy CLIP-Explorer workflows and tools; (B) Output of multiBamSummary
and plotCorrelation
comparing two biological replicates of a CLIP-Seq experiment and one control sample. (C) Output of plotFingerprint
that shows the read coverage for the CLIP-Seq and control samples. (D) Output of CollectInsertSizeMetrics
estimating the insert size for the read libraries. (E) Output of FastQC
showing the duplication levels of the read libraries. (F) Sequence motifs of MEME-Chip
(DREME and MEME) from binding sequence motifs that were predicted from potential binding regions (peaks) obtained by a peak caller like PEAKachu
, Piranha
or PureCLIP
. (G-I) Example output of RCAS
(RNA Centric Annotation System); (G) showing the binding coverage for the transcript and the 5' and 3' UTR, (H) depicting the binding coverage around the exon-intron boundaries, (I) and a generated target distribution plot which states what kind of RNAs the protein of interest prevalently binds to.
We provide the subsequent workflows to automatize the data analysis for iCLIP and eCLIP data. All workflows can be found here. The data needs to be in FASTA or FASTQ format and can be either multiplexed or de-multiplexed. All workflows, except the robust peak analysis, require the data as a list of dataset pairs. A tutorial to create a list of dataset pairs can be found in the CLIP-Seq data analysis tutorial or here. Please have in mind that all workflows need additional input files from the user.
If your data is not de-multiplexed yet, then use the following workflows. The user has to provide the in-line barcodes in a tab-delimited tabular format, for example:
- rep1 TTAG
- rep2 TGGC
- rep3 TTAA
The raw data needs to be in FASTA or FASTQ format as a list of dataset pairs.
You can choose between three different types of peak calling for the data analysis of eCLIP and iCLIP data. The data specification of each of the peak calling algorithms is listed below:
Table 1: Data specification of the different peak calling algorithms.
Tool | Biological Replicates (Yes/No) | Control Data (Yes/No) |
---|---|---|
PEAKachu | Yes | Yes |
PureCLIP | No | Yes |
Piranha | No | No |
{: .table.table-striped} |
If you used the preceding workflows for de-multiplexing, then remove the steps of Cutadapt
and UMI-tools extract
from the following workflows to analyse your data. Simply, import the workflow into you account, remove the tools and connect the lose end directly to the alignment step.
The workflow for the eCLIP data of Nostrand et al. (2016) was used to analyse the data of RBFOX2. Beware when using other data of the study of Nostrand et al. (2016), because the size of the unique molecular identifier (UMI) can be different. The workflow is set to a UMI of five nucleotides. You can change this by importing the workflow into your account and amend the parameter Cut bases from reads before adapter trimming
of the second Cutadapt
step for the CLIP and control data.
- Workflow for the eCLIP data of Nostrand et al. (2016)
- Peak calling with PEAKachu
- Peak calling with PureCLIP
- Peak calling with Piranha
The following workflow can be used if you have picked a peak calling algorithm that do not support biological replicated data. The workflow finds and analysis robust binding regions shared between different peak files.
Please follow the CLIP-Seq data analysis tutorial for a deeper understand of the tools of CLIP-Explorer. Changes to the workflows can be done anytime and without any problems. Simply import the workflow into your account and amend the necessary tools. Therefore, keep the following things in mind:
The workflows uses Cutadapt
to remove standard eCLIP and iCLIP adapter sequences. You need to change Cutadapt
parameters if your read library covers other adapter sequences.
The workflows uses Cutadapt
to trim of the length of the UMI (+ barcode) from one site of the read pair. This depends on the iCLIP, eCLIP and your own protocol. Please check or change the parameter in Cutadapt
based on your UMI and in-line barcode. For more information follow the CLIP-Seq data analysis tutorial.
CLIP-explorer uses UMI-tools extract
to find the UMIs inside your reads. Change the pattern of UMI-tools extract
based on your read library preparation.
Read alignment is done with STAR
which combines genome and transcriptome data. CLIP-Explorer focusses only on uniquely mapped read. Furthermore, STAR
is executed with soft-clipping turned off. For more information follow the CLIP-Seq data analysis tutorial.
You need to specific the insert size of your paired-end reads for PEAKachu
. For that reason, check the output image of CollectInsertSizeMetric
to get an estimate for that parameter.
PureCLIP works best with only one site of the paired end reads, where the cross linking event occurs. Thus, CLIP-Explorer filters out the other mate before the peak calling. Remove the Bam filter
tool to disable this behavior or change Bam filter
to pick the correct site.
CLIP-Explorer uses SlopBED
to extend the peaks a few basepairs to the left and right in order to correct for an underestimation of the binding regions of the peak calling algorithms. For more information follow the CLIP-Seq data analysis tutorial. Remove the tool or change the parameter of SlopBED
to change this behavior.