RNATracker is a deep learning approach to learn mRNA subcellular localization patterns and to infer its outcome. It operates on the cDNA of the longest isoformic protein-coding transcript of a gene with or without its corresponding secondary structure annnotations. The learning targets are fractions/percentage of the transcripts being localized to a fixed set of subcellular compartments of interest.
Our method provides computational-centric insights into the the mRNA trafficking mechanism with identication to the cis-acting zipcodes elements from the transcript sequences.
For what's exactly the RNA trafficking mechanism and its role in the broader gene regulatory network, I find this survey extremely helpful.
RNA localization: Making its way to the center stage
-
Cefra-Seq which provides localization targets for cytoplasm, insoluble, membrane and nucleus.
-
APEX-RIP on KDEL(endoplasmic reticulum), Mito(Mitochdrial), NES (cytosol) and NCL (Nucleus)
Other emerging read-mapping technologies investigating subcellular zipcode proximity might provide additional dataset.
Keras version 2.0.9 is recommeneded. The idea can be easily adapted to other deep leaing frameworks such as Tensorflow and PyTorch.
RNAplfold and forgi libraries from the ViennaRNA package and their python wrapper Eden for acquiring RNA secondary annotations.
TOMTOM for comparing similarity between motifs.
Weblogo and its python wrapper Basset for visualizing learned motifs.
- Scripts/RNATracker.py
- Main experiment entry
- Use
python3 Scripts/RNATracker.py -h
to get a comprehensive list of experiment parameters - For model definitions refer to Models/cnn_bilstm_attention.py
- Main experiment entry
- Scripts/SGDModel.py
- Experiment without padding or truncation
- Scripts/mask_test.py
- Mask test to identify zipcodes with a sufficiently trained RNATracker model
- Transcript_Coordinates_Mapping/get_conservation_scores.py
- A script to prepare conseration scores for the downstream mask test
- Highly recommend downloading Homo_sapiens.GRCh38.cdna.all.fa from the ensembl website, to be further saved under the Data directory
For secondary structures refer to this customized annotator