novel proteins in PPI data

The code in this repo was created for the identification of altProts in high throughput AP-MS data (BioPlex 2.0) and relevant network analysis. It accompanies our article: Newfound coding potential of transcripts unveils missing members of human protein communities.

analytical pipeline

First the raw MS data was reanalyzed using the SearchGUI PeptideShaker tools as described in the manuscript. The code here starts with the hierarchical reports outputted by PeptideShaker.

Parse hierarchical reports
The notebook parse_psm_reports.ipynb aggregates all peptide-spectrum matches (PSMs) from the PeptideShaker hierarchical reports and produces the csv file for input to CompPASS.
Run CompPASS
PSM counts from the previous step are run through CompPASS with R.
Compute CompPASS Plus features
The output from CompPASS is used to compute 9 features representing each candidate interaction (compute_CompPASS_Plus_features.ipynb). Statistical filters are also applied, as described in Huttlin et.al..
run CompPASS Plus
The features are used to train naive bayes classifiers in cross-validation splitted by batch of AP-MS experiments. (run_CompPASS_Plus.ipynb)
Assemble network
The scored interactions are filtered, assembled into a network and compared with the BioPlex networks. (assemble_network.ipynb)
Network topological analysis
Topological features or the resulting network are computed and visualized (degree distribution, average shortest paths, eigenvector centrality etc.). (full_network_features.ipynb)
Clustering and functional analysis
The network is partitioned with the markov clustering algorithm. Clusters are analyzed for enrichment of Gene Ontology terms (clustering_GO.ipynb). Disease associations are also computed for each cluster (disease_associations.ipynb).

Final notes

These scripts and notebooks were run on a high performance computing platform with 24 cores and 256G of memory. Some adjustments may be necessary when less resources are available.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
comppass_plus.py		comppass_plus.py
compute_features.py		compute_features.py
hierarchical_report_parser.py		hierarchical_report_parser.py
network_assembly.py		network_assembly.py
parse_psms.py		parse_psms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

novel proteins in PPI data

analytical pipeline

Final notes

About

Releases

Packages

Languages

License

Seb-Leb/altProts_in_communities

Folders and files

Latest commit

History

Repository files navigation

novel proteins in PPI data

analytical pipeline

Final notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages