Skip to content

UriNeri/RNAfold_virus_Rima

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAfold_virus

Tutor (Neri's) version of Rima Sghayer's lab project.
The repo contains the scripts (mostly as jupyter notebooks) needed to recreate the distribution and statistical analyses of RNA seconadry structure features of several RNA virus groups.
The notebooks can be ran locally (we recommend using jupyter-lab so it would be easier to skip between notebooks) or via Google Colab (just note that some paths would need to be adjusted, and that the enviroment and some of the dependencies would need to be re-installed on each startup).
For more information, take a look at this confrence poster: Slides-link or just contact us directly.

Order of execution

  1. Preprocessing.ipynb - Downloads and installs the dependencies, sets the working enviroment, fetches sequence data for selected virus groups from the RVMT project's FTP hub. In the last code block, it executes RNAfold on the length filtered contigs - this step may take some time, so we recommend using the pre-generated files instead (in this repo, see the /RAW_Data/<virus_group>/ subfolders.
  2. Data_extraction.ipynb - Sources the Contig class and some helper functions to parse the DBN files using the forgi library. Last code block iterates over the different virus groups and dumps a pickled version of the object for each contig.
  3. statistical analysis.ipynb - Focuses on statistical tests, disribution analysis, data exploration and visualization of the extracted features in the different virus groups. The visualizations produced in this step help in understanding the statistical patterns and in interpreting the statistical results.

Dependencies

  1. ViennaRNA

  2. seqkit

  3. Python libraries:

  4. GNU parallel Needed to run multiple parallel instances of RNAfold instead of a single instance using the internal threading option (--jobs=0).

  5. bbtools/bbmap Optional, not in use yet - may be added later for better pre-processing stats (via bbstats.sh) and adding the option to use length cutoff >= median of group instead of xx% of longest contig length (may help with viral groups with highly variable genome length).

About

Rima's lab project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published