Skip to content

notebooks created for identification altProts in high throughput AP-MS data and relevant network analysis

License

Notifications You must be signed in to change notification settings

Seb-Leb/altProts_in_communities

Repository files navigation

novel proteins in PPI data

The code in this repo was created for the identification of altProts in high throughput AP-MS data (BioPlex 2.0) and relevant network analysis. It accompanies our article: Newfound coding potential of transcripts unveils missing members of human protein communities.

analytical pipeline

First the raw MS data was reanalyzed using the SearchGUI PeptideShaker tools as described in the manuscript. The code here starts with the hierarchical reports outputted by PeptideShaker.

  1. Parse hierarchical reports
    The notebook parse_psm_reports.ipynb aggregates all peptide-spectrum matches (PSMs) from the PeptideShaker hierarchical reports and produces the csv file for input to CompPASS.
  2. Run CompPASS
    PSM counts from the previous step are run through CompPASS with R.
  3. Compute CompPASS Plus features
    The output from CompPASS is used to compute 9 features representing each candidate interaction (compute_CompPASS_Plus_features.ipynb). Statistical filters are also applied, as described in Huttlin et.al..
  4. run CompPASS Plus
    The features are used to train naive bayes classifiers in cross-validation splitted by batch of AP-MS experiments. (run_CompPASS_Plus.ipynb)
  5. Assemble network
    The scored interactions are filtered, assembled into a network and compared with the BioPlex networks. (assemble_network.ipynb)
  6. Network topological analysis
    Topological features or the resulting network are computed and visualized (degree distribution, average shortest paths, eigenvector centrality etc.). (full_network_features.ipynb)
  7. Clustering and functional analysis
    The network is partitioned with the markov clustering algorithm. Clusters are analyzed for enrichment of Gene Ontology terms (clustering_GO.ipynb). Disease associations are also computed for each cluster (disease_associations.ipynb).

Final notes

These scripts and notebooks were run on a high performance computing platform with 24 cores and 256G of memory. Some adjustments may be necessary when less resources are available.

About

notebooks created for identification altProts in high throughput AP-MS data and relevant network analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published