Skip to content

Latest commit

 

History

History
75 lines (54 loc) · 4.32 KB

usage.md

File metadata and controls

75 lines (54 loc) · 4.32 KB

Usage Guide

Contents

Introduction

Here are some notes on running and using this pipeline. Using Caper is the canonical, supported and official way to use ENCODE Uniform Processing Pipelines. The example below uses the command caper run, which is the simplest way to run a single pipeline. For running multiple pipelines in a production setting using caper server is recommended. To find details on setting up the server, refer to the Caper documentation.

Installation

  1. Git clone this pipeline.

    $ git clone https://github.com/ENCODE-DCC/wgbs-pipeline
  2. Install Caper, requires java >= 1.8 and python >= 3.6 . Caper is a python wrapper for Cromwell.

    $ pip install caper  # use pip3 if it doesn't work
  3. Follow Caper's README carefully to configure it for your platform (local, cloud, cluster, etc.)

IMPORTANT: Configure your Caper configuration file ~/.caper/default.conf correctly for your platform.

Running Workflows

Make sure you have properly installed the pipeline as described in the installation instructions. Make sure to run the following commands from the root of the repository (i.e. cd wgbs-pipeline if you have not done so already).

  1. Prepare the input JSON file. This file contains the user-specified files and parameters to run the pipeline with. Different examples of input JSON files are available in reference.md#inputs. Details about the different input parameters are available at reference.md#input-desciptions. Copy and paste the entirety of the following command into your terminal (uses heredoc syntax) and run it to create a file called input.json pointing to the test data in this repo as pipeline input:
cat << EOF > input.json
{
  "wgbs.extra_reference": "tests/data/conversion_control.fa.gz",
  "wgbs.fastqs": [
    [
      [
        "tests/data/sample5_data_1_200000.fastq.gz",
        "tests/data/sample5_data_2_200000.fastq.gz"
      ]
    ]
  ],
  "wgbs.reference": "tests/data/sacCer3.fa.gz",
  "wgbs.underconversion_sequence_name": "NC_001416.1"
}
EOF
  1. Run the pipeline using Caper. The -m flag is used to give a memorable name to the metadata JSON file the pipeline will produce once it is finished describng the run. More details about the metadata JSON can be found in the Cromwell documentation
  $ caper run wgbs-pipeline.wdl -i input.json -m wgbs_testrun_metadata.json

Inspecting Outputs

Rather than needing to dig through the highly nested Cromwell output directories or complex JSON metadata, Croo can be used to generate a more legible HTML table of paths to outputs. To invoke croo, run the following, passing a Cromwell metadata JSON file as input:

  $ croo "${PATH_TO_METADATA_JSON}"

Supported Platforms

This pipeline can be run on a variety of platforms via Caper. For a list of supported platforms, see Caper's list of built-in backends. These include local machines, Google Cloud Platform, Amazon Web Services, and a selection of HPC clusters, namely Slurm, PBS, and SGE. Furthermore, Caper provides the ability to use a custom backend, which can be useful in getting it to work with your particular cluster or cluster configuration.

Using Singularity

Caper comes with built-in support for using Singularity containers instead of Docker with --singularity option. This is useful in HPC environments where Docker usage is restricted. See Caper documentation for more information.