Skip to content

Proteomics

Alex V. Kotlar edited this page Sep 5, 2023 · 16 revisions

Protein Abundance Search Design

  1. User has combined protein abundance - like file (combined_protein.tsv, https://fragpipe.nesvilab.org/docs/tutorial_fragpipe_outputs.html#combined_proteintsv)
  • PK: This may not be the exact output, Leo, who made FragPipe, sent us a related, but different version of this output
  • CT: This is the format we should expect (except we will request the row names are gene not gene|uniprot_id): Screenshot 2023-09-05 at 1 56 09 PM
  1. We create a Python cli tool/api library function, which submits a batch request to filter the protein abundance file using a Bystro annotation.
  • This requires the user to first authenticate using our existing Python API for authentication
  • The batch submission requires:
    • user id
    • path to protein abundance file or multipart file upload stream of the protein abundance file if local (in v1 ok to support 1 only)
    • the job id
    • basename for the output
    • the query string query
  1. The job is processed through our API server, which results in a beanstalkd submission to our cluster, where the filtering of the protein abundance file happens, resulting in a new protein abundance file written to disk, and a path for that file returned to the user.
  • We re-use the code (duplication is OK for v1) from bystro/search/save/handler.py for handling the scrolling through annotation index
  • We pull down the annotation index as in bystro/search/save/handler.py, and add the protein abundance values
  • We persist these values on EFS, return a path to the results
  1. We create a Python CLI/API tool/library function to allow the user to pull down the filtered protein abundance results using the user id and the job basename.
Clone this wiki locally