This document describes DNA-methilation data preprocessing pipeline used in UiT. It is inspired by DNAm scripts, written by Christian Page. Link for the whole pipeline and docker image repo: https://github.com/nsh23/pachy-dnameth
- Whole pipeline is in R and consists of 7 steps:
- Load dataset - load RGSet and samplesheet
- Clean data - remove ghost and cross-hybrid probes
- BMIQ normalization, background correction and cell counts estimation
- CNV calculation based on algorithm implementation in CopyNumber450k package
- SVA - factor and variable estimation
- Quality control of clean data
- Gene annotation to CpG sites
The following graph describes the processing flow in a pipeline and step dependencies:
-
Requirements: R, minfi, CopyNumber450k, IlluminaHumanMethylationEPICanno.ilm10b2.hg19, IlluminaHumanMethylationEPICmanifest, wateRmelon, RPMM, parallel, ExperimentHub, FlowSorted.Blood.EPIC, FlowSorted.Blood.450k, sva, DNAcopy, meffil
-
Additional notes: The pipeline was exported to pachyderm framework and tested in HUNT cloud. It took ~4 hours for a Torino_2017 NOWAC dataset.