The code in this repo was created for the identification of altProts in high throughput AP-MS data (BioPlex 2.0) and relevant network analysis. It accompanies our article: Newfound coding potential of transcripts unveils missing members of human protein communities.
First the raw MS data was reanalyzed using the SearchGUI PeptideShaker tools as described in the manuscript. The code here starts with the hierarchical reports outputted by PeptideShaker.
- Parse hierarchical reports
The notebook parse_psm_reports.ipynb aggregates all peptide-spectrum matches (PSMs) from the PeptideShaker hierarchical reports and produces the csv file for input to CompPASS. - Run CompPASS
PSM counts from the previous step are run through CompPASS with R. - Compute CompPASS Plus features
The output from CompPASS is used to compute 9 features representing each candidate interaction (compute_CompPASS_Plus_features.ipynb). Statistical filters are also applied, as described in Huttlin et.al.. - run CompPASS Plus
The features are used to train naive bayes classifiers in cross-validation splitted by batch of AP-MS experiments. (run_CompPASS_Plus.ipynb) - Assemble network
The scored interactions are filtered, assembled into a network and compared with the BioPlex networks. (assemble_network.ipynb) - Network topological analysis
Topological features or the resulting network are computed and visualized (degree distribution, average shortest paths, eigenvector centrality etc.). (full_network_features.ipynb) - Clustering and functional analysis
The network is partitioned with the markov clustering algorithm. Clusters are analyzed for enrichment of Gene Ontology terms (clustering_GO.ipynb). Disease associations are also computed for each cluster (disease_associations.ipynb).
These scripts and notebooks were run on a high performance computing platform with 24 cores and 256G of memory. Some adjustments may be necessary when less resources are available.