Nextflow workflow for main GENIE processing. This follows the SOP outlined in the GENIE confluence page.
Follow instructions here for running the main GENIE processing locally.
It's recommended to use an EC2 instance with docker to run processing and develop locally. Follow instructions using Service-Catalog-Provisioning to create an ec2 on service catalog. You will also want to follow the section SSM with SSH if you want to use VS code to run/develop.
- Install nextflow and any dependencies (e.g: Java) by following instructions here: Get started — Nextflow
- Be sure to pull the latest version of the main GENIE docker image into your environment, see here for more details: GENIE Dockerhub
For an EC2 instance with Linux and docker, see here for installing JAVA 11: How do I install a software package from the Extras Library on an EC2 instance running Amazon Linux 2?
Prior to running the test pipeline, you will need to create a Nextflow secret called SYNAPSE_AUTH_TOKEN
with a Synapse personal access token (docs).
This workflow takes care of transferring files to and from Synapse. Hence, it requires a secret with a personal access token for authentication. To configure Nextflow with such a token, follow these steps:
- Generate a personal access token (PAT) on Synapse using this dashboard. Make sure to enable the
view
,download
, andmodify
scopes since this workflow both downloads and uploads to Synapse. - Create a secret called
SYNAPSE_AUTH_TOKEN
containing a Synapse personal access token using the Nextflow CLI or Nextflow Tower. - (Tower only) When launching the workflow, include the
SYNAPSE_AUTH_TOKEN
as a pipeline secret from either your user or workspace secrets.
The commands under Commands run on the test pipeline. To run on production pipeline, specify a specific value to the release
parameter, e.g:
- 13.1-public (for public releases)
- 13.1-consortium (for consortium releases)
You can run the following command to get a list of the current available parameters, their defaults and descriptions.
nextflow run main.nf --help
See nextflow_schema.json for the same thing.
We use two profiles for the pipeline which contains the docker container defaults and resource specifications for running the pipeline:
- aws_prod - used for production pipeline runs
- aws_test - used for test pipeline runs
See nextflow.config for more details on the profiles' content. Read more about config profiles and how to call them here: Config Profiles
Add -with-docker <docker_image_name>
to every nextflow command to invoke docker in general to be used. See docker-containers for more details.
Note that all the docker parameters have set default docker containers based on the profile you select. If you want to use a different default from what is available in the profiles, you must:
- Docker pull the container(s) you want to use in your local / ec2 instance
- Specify the parameter(s) in your command call below to be the container(s) you pulled
-
Only validate files on test pipeline
nextflow run main.nf -profile aws_test --process_type only_validate -with-docker ghcr.io/sage-bionetworks/genie:main
-
Processes non-mutation files on test pipeline
nextflow run main.nf -profile aws_test --process_type main_process -with-docker ghcr.io/sage-bionetworks/genie:main
-
Processes mutation files on test pipeline
nextflow run main.nf -profile aws_test --process_type maf_process --create_new_maf_db -with-docker ghcr.io/sage-bionetworks/genie:main
-
Runs processing and consortium release (including data guide creation) on test pipeline
nextflow run main.nf -profile aws_test --process_type consortium_release --create_new_maf_db -with-docker ghcr.io/sage-bionetworks/genie:main
-
Runs public release (including data guide creation) on test pipeline
nextflow run main.nf -profile aws_test --process_type public_release -with-docker ghcr.io/sage-bionetworks/genie:main
Run unit tests from the root of the repo. These unit tests cover the code in the scripts/
directory.
python3 -m pytest tests
Unit tests have to be run manually for now. You will need
pandas
and synapseclient
to run them. See Dockerfile for the version of synapseclient
to use.
Follow instructions here for running the main GENIE processing directly on Nextflow tower:
- Please create a IBCDPE help desk request to gain access to the
genie-bpc-project
on Sage nextflow tower. - After you have access, you will want to head to the launchpad
- Click on the
main_genie
pipeline - Fill out the parameters for launching the specific parts of the pipeline.
- If you need to modify any of the underlying default launch settings like config profiles, click on Launch settings on the right. The relevant settings you would need to modify would be the following:
- Config profiles - profile to launch with, see Config profiles for more details
- Revision number - branch of
nf-genie
that you're launching the pipeline on