Skip to content

Latest commit

 

History

History
109 lines (66 loc) · 10.2 KB

index-clip.md

File metadata and controls

109 lines (66 loc) · 10.2 KB
layout
subsite-galaxy

Galaxy CLIP-Explorer

Welcome to the Galaxy CLIP-Explorer -- a webserver to process, analyse and visualise CLIP-Seq data.

1. Getting Started with Galaxy CLIP-Explorer

Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take a guided tour through Galaxy's user interface.

Take a look at the CLIP-Seq data analysis tutorial on the Galaxy Training Network where you can analyse CLIP-Seq data of RBFOX2 from human liver cancer cells (Hep G2). The tutorial will help you to understand the analysis steps and the most important parameters and tools that are used in CLIP-Explorer.

The underlying workflow of the tutorial can be found here.

We recommend to follow the tutorial on FastQC for quality checks and the tutorial for IGV for data inspection.

The Galaxy Training Network tutorial uses eCLIP data from human liver cancer cells (Hep G2) and is hosted on zenodo: DOI

Galaxy CLIP-Explorer can process large CLIP-Seq data of eCLIP and iCLIP. We processed eCLIP data with around 20 million reads from Nostrand et al. (2016). CLIP-Explorer can handle multiplexed and de-multiplexed eCLIP and iCLIP data in FASTQ and FASTA format.

2. Galaxy CLIP-Explorer -- Many Possibilities

(A) Galaxy CLIP-Explorer workflows and tools; (B) Output of multiBamSummary and plotCorrelation comparing two biological replicates of a CLIP-Seq experiment and one control sample. (C) Output of plotFingerprint that shows the read coverage for the CLIP-Seq and control samples. (D) Output of CollectInsertSizeMetrics estimating the insert size for the read libraries. (E) Output of FastQC showing the duplication levels of the read libraries. (F) Sequence motifs of MEME-Chip (DREME and MEME) from binding sequence motifs that were predicted from potential binding regions (peaks) obtained by a peak caller like PEAKachu, Piranha or PureCLIP. (G-I) Example output of RCAS (RNA Centric Annotation System); (G) showing the binding coverage for the transcript and the 5' and 3' UTR, (H) depicting the binding coverage around the exon-intron boundaries, (I) and a generated target distribution plot which states what kind of RNAs the protein of interest prevalently binds to.

3. Workflows

We provide the subsequent workflows to automatize the data analysis for iCLIP and eCLIP data. All workflows can be found here. The data needs to be in FASTA or FASTQ format and can be either multiplexed or de-multiplexed. All workflows, except the robust peak analysis, require the data as a list of dataset pairs. A tutorial to create a list of dataset pairs can be found in the CLIP-Seq data analysis tutorial or here. Please have in mind that all workflows need additional input files from the user.

3.1 From scratch to de-multiplexed FASTQ files

If your data is not de-multiplexed yet, then use the following workflows. The user has to provide the in-line barcodes in a tab-delimited tabular format, for example:

  • rep1 TTAG
  • rep2 TGGC
  • rep3 TTAA

The raw data needs to be in FASTA or FASTQ format as a list of dataset pairs.

3.2 From scratch with de-multiplexed FASTQ files

You can choose between three different types of peak calling for the data analysis of eCLIP and iCLIP data. The data specification of each of the peak calling algorithms is listed below:

Table 1: Data specification of the different peak calling algorithms.

Tool Biological Replicates (Yes/No) Control Data (Yes/No)
PEAKachu Yes Yes
PureCLIP No Yes
Piranha No No
{: .table.table-striped}

Note if you have used the de-mutliplexing workflows:

If you used the preceding workflows for de-multiplexing, then remove the steps of Cutadapt and UMI-tools extract from the following workflows to analyse your data. Simply, import the workflow into you account, remove the tools and connect the lose end directly to the alignment step.

Note if you use eCLIP data of Nostrand et al. (2016):

The workflow for the eCLIP data of Nostrand et al. (2016) was used to analyse the data of RBFOX2. Beware when using other data of the study of Nostrand et al. (2016), because the size of the unique molecular identifier (UMI) can be different. The workflow is set to a UMI of five nucleotides. You can change this by importing the workflow into your account and amend the parameter Cut bases from reads before adapter trimming of the second Cutadapt step for the CLIP and control data.

eCLIP

iCLIP

3.3 Further optional peak analysis

The following workflow can be used if you have picked a peak calling algorithm that do not support biological replicated data. The workflow finds and analysis robust binding regions shared between different peak files.

4. Remarks

Please follow the CLIP-Seq data analysis tutorial for a deeper understand of the tools of CLIP-Explorer. Changes to the workflows can be done anytime and without any problems. Simply import the workflow into your account and amend the necessary tools. Therefore, keep the following things in mind:

4.1 Adapter sequences

The workflows uses Cutadapt to remove standard eCLIP and iCLIP adapter sequences. You need to change Cutadapt parameters if your read library covers other adapter sequences.

4.2 UMI and in-line barcodes

The workflows uses Cutadapt to trim of the length of the UMI (+ barcode) from one site of the read pair. This depends on the iCLIP, eCLIP and your own protocol. Please check or change the parameter in Cutadapt based on your UMI and in-line barcode. For more information follow the CLIP-Seq data analysis tutorial.

CLIP-explorer uses UMI-tools extract to find the UMIs inside your reads. Change the pattern of UMI-tools extract based on your read library preparation.

4.3 Read alignment

Read alignment is done with STAR which combines genome and transcriptome data. CLIP-Explorer focusses only on uniquely mapped read. Furthermore, STAR is executed with soft-clipping turned off. For more information follow the CLIP-Seq data analysis tutorial.

4.4 Peak calling with PEAKachu

You need to specific the insert size of your paired-end reads for PEAKachu. For that reason, check the output image of CollectInsertSizeMetric to get an estimate for that parameter.

4.5 Peak calling with PureCLIP

PureCLIP works best with only one site of the paired end reads, where the cross linking event occurs. Thus, CLIP-Explorer filters out the other mate before the peak calling. Remove the Bam filter tool to disable this behavior or change Bam filter to pick the correct site.

4.6 Extension of the binding regions

CLIP-Explorer uses SlopBED to extend the peaks a few basepairs to the left and right in order to correct for an underestimation of the binding regions of the peak calling algorithms. For more information follow the CLIP-Seq data analysis tutorial. Remove the tool or change the parameter of SlopBED to change this behavior.