PyGNA Snakemake workflow:

Current version: 1.0.1-dev

A simple Snakemake workflow to perform network analyses using the Python Geneset Network Analysis (PyGNA) package.

Authors

Viola Fanfani, [email protected] (lead developer)
Giovanni Stracquadanio, [email protected]

Overview

Usage

Step 1: Install workflow

If you simply want to use this workflow, download and extract the latest release.

In any case, if you use this workflow in a paper, please cite our PyGNA as follows:

Step 2: Configure workflow

Configure the workflow according to your needs, via editing the file config.yaml. Now a template config file is config_temp.yaml.

Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

or

snakemake --use-conda --drmaa --jobs 100

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --use-singularity

in combination with any of the modes above. See the Snakemake documentation for further details.

Step 4: Check results

Results are stored in the results folder.

Benchmarking pipeline

Stochastic Block Model

We provide a full pipeline for the benchmark of GNT and GNA methods through the SBM generated data.

First check the config_sbm.yaml configfile and use the desired parameters, then the whole pipeline can be run with:

snakemake --snakefile Snakefile_sbm --configfile config_sbm.yaml --cores 1

High Degree Nodes

For the HDN simulations please refere to the previous paper snakefile Snakefile_paper_old

Paper Results, TCGA

The entire pipeline for the processing of cancer data can be run as follows

Please note: the scripts/tcga-download.R script downloads and processes the RNAseq dataset. This step is time and memory consuming, but, most importantly, we have noticed that it is difficult to be able to replicate the exact environment/TCGA version being installed (there are many issues of this kind raised on the TCGAbiolinks github repo ). For reproducibility, we provide the differential expression tables already preprocessed. We have generated this data with R=4 and TCGAbiolinks v2.14.0.

use the --use-conda flag to install the environment with the same parameters

Following the next steps you should be able to run the paper pipelines:

First download the data folder from Zenodo
The pipeline uses the data folder path, you can:
A. add the data folder inside the workflow-pygna folder B. change the relative path in the config file
Check the config_paper.yaml configuration files. They include number of permutations and cores parameters, tweak them as needed (for the moment we have set 3 and 1 so that you can quickly check if all results are generated. ).
As we provide intermediate files whose generation can be very time consuming (differential expression results and SP/RWR matrices), run
```
 snakemake --snakefile Snakefile_paper  --configfile config_paper.yaml -t 
```
to update all files time tag and avoid recreating them (snakemake would try to run the pipeline rule again if the files have been recently modified).

Paper Results, previous version

We provide a Snakefile to replicate the results of the paper:

snakemake --snakefile Snakefile_paper --configfile config_paper --use-conda --cores $N

single_geneset: with this suffix we refer to all the results of for the high-throughput experiments taken from TCGA biolinks (Fig. 2).
Please note: the scripts/tcga_rnaseq.R script downloads and processes the BLCA RNAseq dataset. This step is time and memory consuming, but, most importantly, we have noticed that it is difficult to be able to replicate the exact environment/TCGA version being installed (there are many issues of this kind raised on the TCGAbiolinks github repo ). For reproducibility, we provide the blca_diffexp.csv file that contains the full differential expression results. We have generated this data with R=3.6 and TCGAbiolinks v2.14.0. use the --use-conda flag to install the environment with the same parameters
multi: refers to the analysis of multiple geneset from the Bailey et al. paper (Fig. 4).

Following the next steps you should be able to run the paper pipelines:

First download the data folder from Zenodo
The pipeline uses the data folder path, you can:
A. add the data folder inside the workflow-pygna folder B. change the relative path in the config file
Check the config_paper_single.yaml and config_paper_multi.yaml configuration files. They include number of permutations and cores parameters, tweak them as needed (for the moment we have set 3 and 1 so that you can quickly check if all results are generated. ).
As we provide intermediate files whose generation can be very time consuming (differential expression results and SP/RWR matrices), run
```
 snakemake --snakefile Snakefile_paper <analysis>_all --configfile config_paper_<analysis>.yaml -t 
```
with <analisys> being one of single or multi, to update all files time tag and avoid recreating them (snakemake would try to run the pipeline rule again if the files have been recently modified).

To obtain all the results for the single geneset (avoid the first step to have the full regeneration of all files):

 snakemake snakemake --snakefile Snakefile_paper single_all --configfile config_paper_single.yaml -t 
 
 snakemake --snakefile Snakefile_paper single_all --configfile config_paper_single.yaml --use-conda

To obtain the results for the multi geneset

 snakemake snakemake --snakefile Snakefile_paper multi_all --configfile config_paper_multi.yaml -t 
 
 snakemake --snakefile Snakefile_paper multi_all --configfile config_paper_multi.yaml

To obtain the results for the hdn simulations

 snakemake --snakefile Snakefile_paper hdn_all --configfile config_paper_hdn.yaml

To obtain the results for the multi geneset

 snakemake --snakefile Snakefile_paper sbm_all --configfile config_paper_sbm.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyGNA Snakemake workflow:

Authors

Overview

Usage

Step 1: Install workflow

Step 2: Configure workflow

Step 3: Execute workflow

Step 4: Check results

Benchmarking pipeline

Stochastic Block Model

High Degree Nodes

Paper Results, TCGA

Paper Results, previous version

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
envs		envs
rules		rules
scripts		scripts
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
Snakefile_paper		Snakefile_paper
Snakefile_paper_old		Snakefile_paper_old
Snakefile_sbm		Snakefile_sbm
Snakejob.sh		Snakejob.sh
config.yaml		config.yaml
config_paper.yaml		config_paper.yaml
config_paper_hdn.yaml		config_paper_hdn.yaml
config_paper_multi.yaml		config_paper_multi.yaml
config_paper_sbm.yaml		config_paper_sbm.yaml
config_paper_single.yaml		config_paper_single.yaml
config_sbm.yaml		config_sbm.yaml
dag.png		dag.png

License

stracquadaniolab/workflow-pygna

Folders and files

Latest commit

History

Repository files navigation

PyGNA Snakemake workflow:

Authors

Overview

Usage

Step 1: Install workflow

Step 2: Configure workflow

Step 3: Execute workflow

Step 4: Check results

Benchmarking pipeline

Stochastic Block Model

High Degree Nodes

Paper Results, TCGA

Paper Results, previous version

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages